#1 2020-03-06 22:23:39

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

TCP (HTTP) server improvements for Linux

While reading this brilliant book (it's change my mind in many programming aspects) I found what file descriptor returned from accept() call inherits most of parent socket properties under Linux.
A small patch to TCrtSocket marge request #278 allow to save 3% of electricity.

10000 requests to /timestamp endpont  measured using strace -cf ./test  (output cropped) show us what 3% of time on setsockopt (60 000 cals or 6 calll per accept) go away

Before patch

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 47.95    1.680828          55     30689       276 futex
 37.31    1.307881         131     10001           accept
  3.33    0.116714           2     60018           setsockopt
  2.47    0.086639           9     10004           sendto
  1.87    0.065584           7     10002           recvfrom
  1.75    0.061495           1     70019           clock_gettime
  1.52    0.053310           5     10232           close
  1.35    0.047174           5     10002           shutdown

After patch

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 34.84    1.309760          43     30652       349 futex
 28.41    1.068102     1068102         1         1 rt_sigsuspend
 26.34    0.990362          99     10001           accept
  2.21    0.083104           8     10004           sendto
  1.97    0.073905           1     70019           clock_gettime
  1.57    0.058937           6     10002           recvfrom
  1.24    0.046750           5     10231           close
  1.24    0.046503           5     10002           shutdown

The next step of optimization can be a huge number of calls  to futex (not found a reason yet) and clock_gettime. C (goes from GetTick64 in TrySockRecv). Help is welcome here

Last edited by mpv (2020-03-06 22:27:29)

Offline

#2 2020-03-07 15:27:44

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Nice trick!
I have merged it.

About the futex, is it the critical section used in TSynThreadPool.Push() ?
Perhaps TSynThreadPoolSubThread.Execute may also be refactored to use an InterlockedExchange() instead of  EnterCriticalSection(fOwner.fSafe).
I guess that strace reports a lot of time because there is a contention.
We may find some ideas from https://github.com/BeRo1985/pasmp

Offline

#3 2020-03-07 15:47:40

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

I have committed https://synopse.info/fossil/info/d79db41d83
It should reduce the thread contention, by using two locks in TSynThreadPool (one for the main threads list, and another for the contention list), and a per-thread another lock for the processing flag pointer.
There was a single critical section, which may much more likely trigger concurrent access, therefore futex API call. IIRC if there is no concurrent access, there is no OS API call.

Offline

#4 2020-03-07 17:22:33

danielkuettner
Member
From: Germany
Registered: 2014-08-06
Posts: 330

Re: TCP (HTTP) server improvements for Linux

One question: why you have choosed TRTLCriticalSection instead of TSynLocker?

Offline

#5 2020-03-07 17:49:25

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Because there is no dependency to SynCommons.

Offline

#6 2020-03-08 13:46:21

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

@ab - please, rollback this changes - https://synopse.info/fossil/info/d79db41d83 - it produce a dedalock (randomly, sometimes on 1000 request, sometimes on 9000) seem it occurs when PendingContextCount became > 0. In any case (even if no dedlock) the number of futex calls after this patch remains the same (accept x 3) in HTTP1.0 mode (without keep alive)

Last edited by mpv (2020-03-08 13:48:30)

Offline

#7 2020-03-08 15:36:06

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

I found form where one of the futex calls is comes - this is not from calls to Enter/LeaveCriticalSection but from call to fEvent.SetEvent (I'm use a "writeln base debugging" and sure this is where).

Last edited by mpv (2020-03-08 15:36:44)

Offline

#8 2020-03-08 21:42:59

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Please check https://github.com/synopse/mORMot/commi … ee5bc1c915

It reverts to a single CS, but still with some enhancements.
Also reduce the TObjectList/TList usage.

Offline

#9 2020-03-09 00:43:07

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Please see also https://synopse.info/fossil/info/1cd7d503dd

TSynList/TSynObjectList should be cross-compiler efficient TList/TObjectList replacement, without any notifications, but slightly faster and easier to inherit/extend.

Offline

#10 2020-03-09 09:43:50

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Now everything seams to be stable (at last from my CI POW).

As a results in more or less realistic scenario TechEmpower single DB query for SQLite3 DB + SyNode(plain mORMot should be faster) + HTTP1.0 + 400 concurrent connection performance increased from ~35000 RPS to ~37000 RPS (on my core i5U Linux laptop).

On the low level the way to future optimization is clock_gettime

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 18.90    6.102560     6102560         1         1 rt_sigsuspend
 15.56    5.023886           9    546456           clock_gettime
 15.34    4.952688          91     54637           accept
 14.26    4.603774         115     40100           mprotect
  7.69    2.481902          61     40805      3434 futex
  4.08    1.317904         232      5692       569 lstat
  3.73    1.204458          22     55395           close
  3.68    1.188450          22     54635           sendto
  2.12    0.684840         649      1056           lseek
  2.07    0.668234          12     54638       387 shutdown
  1.98    0.637603          12     55438           fstat
  1.87    0.604021          11     54639           poll
  1.71    0.552490        1309       422           read
  1.47    0.473438           9     54638           recvfrom
  1.46    0.471973           9     54637           getpeername
  1.43    0.460655           8     54678         5 ioctl

I'm not sure, but can we calculate a timeout in  TCrtSocket.TrySockRecv / TrySndLow only in case single read/write operation do not receive/sends all data? In most case we read/write all we need and calling a GetTick64 in the beginning is not necessary.

Offline

#11 2020-03-09 11:35:02

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Offline

#12 2020-03-09 21:14:07

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

And finally I found a root of evil. FPC use syscall for getclock_time, while libc use vDSO (Linux subsytem what expose some of kernel level functions to user space). See https://github.com/synopse/mORMot/pull/281/files

GetTickCount64 fpc    2 494 563 op/sec
GetTickCount64 libc 119 919 893 op/sec

The picture now is such

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 22.92    8.514322         214     39795           mprotect
 16.64    6.183122          45    138342           accept
 11.75    4.365739          31    139100           close
 10.13    3.764708          27    138344           sendto
  7.06    2.624247          19    138343           recvfrom
  6.69    2.486703          18    138383         1 ioctl
  6.03    2.241786          16    138343       393 shutdown
  5.82    2.161967          16    138342           getpeername
  5.56    2.064928          15    138344           poll
  3.14    1.165810        3490       334        79 futex

Not ideal for me, but MUCH better smile

In absolute numbers (see my prev post) 37680 RPS

Last edited by mpv (2020-03-09 21:21:05)

Offline

#13 2020-03-10 01:06:30

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Very nice finding!
I have merged your pull request.

I suspect it could lead to a great deal of performance improvement, on highly loaded server with a lot of REST calls - therefore a lot of timing.
Note that GetTickCount64() won't be the only beneficial: I suspect QueryPerformanceMicroSeconds() would also benefit from using the libc.

I guess "mprotect" is from the heap manager reserving memory.

Offline

#14 2020-03-10 11:17:41

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Our previous commit about GetTick64 removal broke things on Windows.
So I kept it only on POSIX.
See https://synopse.info/fossil/info/1078ba0f05

Offline

#15 2020-03-10 13:44:02

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Actually we do not need to remove it even on POSIX after GetTickCount64 if fixed....

About mprotect - I also think this is heap allocation. So for a while do not plane to optimize it ( and most of all this is in my code - not in mORMot).

The things what can be optimized is  getpeername and ioctl

With getpeername I found elegant solution - it comes from GetRemoteIP call inside TCrtSocket.AcceptRequest. But we already have a remote IP after call to accept in the THttpServer.Execute loop. So we can create a

 ServerSock := fServer.fSocketClass.Create(fServer); IPText(ClientSin, ServerSocket.fRemoteIP; 

right in the main loop instead of ThreadPull and pass ServerSock instance to Task() instead of socket fd.

This removes a call of getpeername for both Win/POSIX.

But I want to discuss a one step further.

The problem:

In current implementation we accept a connection and if all threads are busy - put it in the Contention queue (the same did Windows on the HTTP.SYS level).

In case server process must be closed either because of internal errors or because OS kill it, or power down we lost all queued sockets (and data what clients sent but we do not read yet - at last first 4Kb (MTU size) is alredy sended by client and buffered in the kernel network intrface).

The worst case is heppens when all threads are busy for a long time (for example database problems or something wrong in server businness logic), contention queue overflows and server reboots. All 1000 (default contention queue size) queries are losts.

The idea:

Why should we accept socket connection in case there is no free thread in the thread pool?
Socket already have a queue for non-accepted connection (see DefaultListenBacklog in SynCrtSock). In case we have a short performance problems its will work as analogue of our contention queue. In case we heve a logn-term performance problem it works even better - caller got error on connect() and:
- in case of load ballancer - select next node from upstream
- in case of direct connection - can try to reconnect

The pros:
- kernel level contention queue
- no "parasite" traffic in case server is busy
- resource saving (every accepted connection weput in contention queue take one port, create memory buffer etc)
- we can simpily our ThreadPull implementation - remove contention queue, simplify selection of free thread

Implementation of http server main loop can be:

while not Terminated do begin
  waitForFreeThreadInPull()
  SockedFd := Accept(Sock.Sock,ClientSin);
  ServerSock := fServer.fSocketClass.Create(fServer);
  IPText(ClientSin, ServerSocket.fRemoteIP; 
  OnConnect;
  ThreadPool.Push(ServerSock);
....
end;

Everything is so beautiful that maybe I missed some hidden problems?
Sorry for long post - to the end of week I temporary have very bad conection so buffering all ideas

Offline

#16 2020-03-10 19:05:20

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

But it the error on connect acceptable?
For most simple use cases, without any load-balancer, the re-connection seems like a lot of trouble if the server is just full for a few milliseconds.
Perhaps we may still use a queue, and call something like waitForFreeThreadInPullOrPlaceInQueue() before Accept().
Then for a load-balancer, we just setup our server with no queue and let it reject the connection and switch to another server.

Offline

#17 2020-03-10 20:43:09

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Not exactly. If we do not call accept() without any load-balancer client will wait (up to client-side read timeout), because on server side OS put connection in the listen socket backlog. Good explanation of backlog is here
Just try to do such:

procedure THttpServer.Execute;
..
  // main server process loop
  if Sock.Sock>0 then
  try
    while not Terminated do begin
      sleep(10000);              <------------------------- thread pull full wait for it for 10 second
      ClientSock := Accept(Sock.Sock,ClientSin);     

I verify - clients (browser, curl, wrk) is waits

Offline

#18 2020-03-10 23:21:25

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

In fact, there is already such a waitForFreeThreadInPullOrPlaceInQueue() waiting loop in TSynThreadPool.Push, when the queue is full.
It will wait up to ContentionAbortDelay ms (5000 by default) and during this delay, no new connection is ACCEPTed, since Push() will be blocking and don't return to the main THttpServer.Execute loop.
This ContentionAbortDelay was only effective for IOCP, but I just made it also for the socket server.
To test it, just set a small queue (e.g. QueueLenght=10 instead of 1000 slots) and see it happening.

Together with a refactoring of RemoteIP taken directly without GetPeerName call:
please check https://synopse.info/fossil/info/62688161a7

Another potential optimization may be to reuse the THttpServerSocket instances between the calls.
No need to reallocate the SockIn buffer for instance.
But it may be premature optimization.

Offline

#19 2020-03-12 16:12:18

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

@ab - I understand about accept() - thanks!

I also apply another small optimization for Linux. Now all is ideal from my POW (agree what  reuse the THttpServerSocket instances is premature optimization - see below).

I try to collect all together in this post - may be you decide to create a blog article (IMHO we are very-very fast in REST scenario (even w/o this optimizations) compared to other FPC/Delphi based HTTP servers).

1) Testing methodology
As a server for test we take a Samples/21 - HTTP Client-Server performance server, set it to HTTP 1.0 mode and change port to 8881 (allow to bind w/o root permission on linux):

// launch the server
aHTTPServer := TSQLHttpServer.Create('8881',[aServer]);
// disable keep-alive to prevent thread spawn for every Keep-Alive HTTP connection
(aHTTPServer.HttpServer as THttpServer).ServerKeepAliveTimeOut := 0;

As a client we use a wrk - well known tool to test a hi-load http servers. Runnign with 8 thread and 400 connections for 5 second :

wrk -t8 -c400 -d5s http://localhost:8881/root/timestamp

For every test we worm-up a server first, when attach an strace to server process and measure a syscalls

sudo strace -cf -p `pidof Project21HttpServer`

and when stop an strace and measure performance.

All tests is on laptop (this is why we run test for 5 second and wait after each test - to prevent CPU trotting)
Linux 5.3.0-40-generic (cp65001) 8 x Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz (x64)

The initial stage: ~58 000 request per second (RPS)

Running 5s test @ http://localhost:8881/root/timestamp
  8 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.16ms    9.16ms 211.01ms   99.46%
    Req/Sec     7.39k     1.97k   13.89k    71.50%
  294288 requests in 5.03s, 50.80MB read
Requests/sec:  58449.58
Transfer/sec:     10.09MB

Strace shows

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 62.66   40.416056        1543     26195           poll
 17.38   11.208072          61    183355           clock_gettime
  5.41    3.486260       10694       326        90 futex
  4.28    2.762974         105     26195           accept
  3.81    2.455415          16    157170           setsockopt
  2.32    1.496650          57     26195           close
  1.42    0.913937          35     26195           getpeername
  1.23    0.792855          30     26191           sendto
  0.52    0.334711          13     26198           recvfrom
  0.50    0.321031          12     26195       362 shutdown
  0.47    0.305818          12     26195         3 ioctl
  0.00    0.002638          36        73           select

Too many system calls. Let's try to remove some of them.

1) clock_gettime - the strange thing in FPC codebase. Fixed by enabling vDSO time function
2) setsockopt - Linux give us a file descriptor properties inheritance - let's use it
3) getpeername - lets use peer name we got from accept() instead of getpeername and optimize accept for hi-load

On this stage results are ~61 000 RPS

Running 5s test @ http://localhost:8881/root/timestamp
  8 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.93ms    5.14ms 209.70ms   99.70%
    Req/Sec     7.74k     1.69k   11.72k    69.25%
  308383 requests in 5.03s, 53.23MB read
Requests/sec:  61276.04
Transfer/sec:     10.58MB

and syscalls

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 60.56   63.161843         845     74746           poll
 12.17   12.694926         170     74747         1 recvfrom
  7.61    7.938714         106     74747           close
  6.68    6.970383          93     74746         4 ioctl
  4.63    4.825983        4699      1027       238 futex
  3.03    3.164977          42     74746           accept
  3.01    3.142014          42     74742           sendto
  2.30    2.398296          32     74747       342 shutdown
  0.00    0.002180          46        47           select

Much better, but not ideal - if we look at nginx strace for the same scenario shutdown and ioctl is absent there.
Let's do the same in out mORMot and remove unnecessary ioctl and shytdown

And the final results is ~63000 RPS

Running 5s test @ http://localhost:8881/root/timestamp
  8 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.96ms   10.01ms 210.92ms   99.35%
    Req/Sec     8.04k     1.89k   13.67k    71.50%
  320319 requests in 5.04s, 55.29MB read
Requests/sec:  63547.40
Transfer/sec:     10.97MB

and a system calls are

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 70.36   94.925387         978     97030           poll
  9.86   13.308974         137     97030           close
  7.01    9.459046          97     97027           sendto
  6.23    8.405241          87     97039           recvfrom
  4.25    5.736011        2154      2663       684 futex
  2.28    3.075656          32     97030           accept
  0.00    0.004272          41       103           select

The ideal code is not one with nothing to add, but one with nothing to remove smile

As a results in REST scenario mORMot http server performance under Linux increased from 57 000 RPS to 63 000 RPS (10%).

And all of this is impossible without excellent Linux tools like strace, perf, valgrind we unfortunently miss on Windows platform.

P.S.
If server is in keep-alive mode the same test give us AWESOME 170 000 RPS but creates too many thread (for a while). So (IMHO) applicable for scenarios with <1000 concurrent users

Last edited by mpv (2020-03-12 16:13:46)

Offline

#20 2020-03-12 17:03:48

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Great!
I guess that with nginx in reverse proxy, we could have slightly better numbers: keep alive from the client side, and less connections to the mORMot server.
Did you make some benchmark tests with nginx in reverse proxy?

I also suspect that a remote connection over a local fast network (1GB link) may be worth trying to measure.
With nginx as reverse proxy, we would measure with a more realistic scenario.
And it may also validate if using Unix Socket instead of TCP Socket between nginx and the mORMot server makes another difference...

You did an awesome job!

Offline

#21 2020-03-13 14:01:43

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

To test configuration with nginx we need a powerfull server - on laptop everything become slow (maximum 25000 RPS). This is because kernel need to handle twice more sockets compared to direct connection. Hope I can create appropriate infrastructure next week (in our country the government is transferring people to remote work in connection with the COVID - access to infrastructure is difficult)

Offline

#22 2022-01-07 18:38:00

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Thanks to Christmas I found some time for a mORMot2 (still not enough to migrate my product to it, but starts to discover a codebase more closely).
The main mORMot2 feature for me is an epoll (Linux) based HTTP1.1 (keep-alive) HTTP/WebSocket server with a fixed size workers thread pool.
So I starts from dummy THttpAsyncServer instance to see how it works - results are PERFECT I got ~ 213 000 RPS  for dummy server with 1000 concurrent keep-alive connections

But there is some problems:
- first one is about broken keep-alive -it is simple and fixed in #70
- second one  is about performance and it's more complex: THttpAsyncServer in idle mode (no requests) uses ~3% of host CPU. This is because of massive futex & nanosleep   syscalls:

$ sudo strace -cf -p `pidof httpServerRaw`

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 32.20    0.329206         120      2733      1366 futex
 29.48    0.301435          46      6492           nanosleep

The sources for it  is TPollSockets.GetOne - a futex is a critical section inside GetOnePending and nanosleep is inside SleepStep call.
We definitionally should do something with this.

Currently I can`t provide a good solution. Just some ideas:
  - timeouts can be implemented using timerfd_* together with epoll
  - lock in request queue can be avoided by using ring buffer

Will continue to investigate

Last edited by mpv (2022-01-07 18:46:13)

Offline

#23 2022-01-07 18:44:10

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

This is syscalls statistic for 521053 HTTP keep alive request (10 second duration load with 1000 concurrent connections)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 23.46   13.710255         331     41384     12693 futex
 21.29   12.438808        8421      1477         1 accept
 21.11   12.336652         858     14370           nanosleep
 17.78   10.389063          19    522047           sendto
 15.20    8.882895          17    522076           recvfrom
  1.05    0.612528         333      1835           epoll_wait
  0.04    0.024850           8      2952           epoll_ctl

~ 45% of time we are in futex and nanosleep syscalls

Last edited by mpv (2022-01-07 18:45:19)

Offline

#24 2022-01-07 19:54:12

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

I may need to change the GetOnePending() with a lockless queue, and classical TEvent thread wakeup.

What I don't understand is that within TPollSockets.GetOne, if no client is connected,  SleepStep() should be in 120-250 ms steps, so should be at almost 0% CPU use.

Could you try https://github.com/synopse/mORMot2/commit/3c8b0ea8 ?
I tried to ensure that those 120-250ms steps were used in case of idle server.

Perhaps the culprit is not GetOne or PollForPendingEvents.

Edit: I suspect my modification won't change anything.
Because we are not in atpReadSingle thread, but in a atpReadPool configuration.
So normally an idle server would be detected and run this:

            if fOwner.fClients.fRead.Count = 0 then
              fEvent.WaitFor(INFINITE); // blocking until next accept()

which should consume 0% of CPU until next accept().

Edit2: In fact, this WaitFor(INFINITE) is only called at initial accept(). Not for any next time where no client is connected.
I have written https://github.com/synopse/mORMot2/commit/5fdcbbce to ensure if no client is connected, then we wait for the next Accept().
Hope it helps for reaching 0% CPU on Idle server.

Remark: I have explicitly documented that this new async server is for huge number of concurrent connections. If you have a few connections at once, then the regular/old thread server is still there, and may use better the resources.

Offline

#25 2022-01-08 09:52:42

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Still 3% CPU for idle server with latest sources.
I profile using valgrind - most CPU time spends on internal fpc_initilize\fpc_finalize what wraps PollForPendingEvents function

  THTTPAsyncServer.Execute 
     if fAsync.fClients.fWrite.GetOne(1000, 'W', notif) then
       
         if fPending.Count = 0 then
            PollForPendingEvents({timeoutMS=}10); <-------- this call        

I remove all try\finally inside PollForPendingEvents but fpc_initialize\finalize still exists. Is this because of records in var block?

P.S.
  a raw server sources I use for testing

Last edited by mpv (2022-01-08 10:06:59)

Offline

#26 2022-01-08 10:13:12

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

So it is about polling for writes.

Most of the time, there is no pending output to write. So I guess polling here does not make any sense.
I will poll only if needed.

The fpc_initialize/finalize are for the local sub/new records.

IIRC epoll_wait() can still run in a thread while epoll_ctl() is done in another thread.
So I guess that on Linux we could use a single fPoll[], then use epoll_wait() timeout for waiting instead of polling.
For select and poll (Windows and BSD), this is not possible because those are not thread-safe.

Offline

#27 2022-01-10 19:05:38

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

I spent the whole day making a huge refactoring of epoll use.
Currently poll is actually faster than epoll... not what we expected!

Several issues have been identified. I almost doubled the keep-alive performance.
But it broke how WebSockets sent their frame asynchronously using epoll...

So it is not finished yet.
I hope I will publish something stable tomorrow.

Offline

#28 2022-01-10 22:34:52

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

I will look forward to

Offline

#29 2022-01-11 13:26:49

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Please try https://github.com/synopse/mORMot2/commit/98d97175

From my tests, epoll API should use much less OS API calls, and also better performance in keep alive (HTTP/1.1) mode. Also when upgraded to WebSockets.
Still need to tune the multi connection (HTTP/1.0) performance.

I am looking forward to your feedback and numbers.

Edit: Please check also https://github.com/synopse/mORMot2/commit/013be075
It should enhanced HTTP/1.0 performance on small requests.

Offline

#30 2022-01-12 17:08:29

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

My test results (HTTP server with 4 thread)
- in idle mode CPU consumption is 0% - GOOD
- while requesting with 100 concurrent HTTP 1.1 connections max RPS is 274 176 - !impressive!
   For comparison on the same hardware for the same test we are:
      - 6x times faster compared to nodejs what gives 41 000 RPS  (in case mORMot thread pool size is 1 I still got 145659 RPS) 
      - 2x times faster compared to nginx!! what gives 104 642  RPS (4 working thread - the same as for our server)

Now the bad part sad
Implementation is not stable - for some conditions server hangs in case ServerThreadPoolCount > 1.
For ServerThreadPoolCount=4   100 (and even 500) concurrent connection newer hang the server, but 1000 does (after some amount of requests 2 thread consumes 1 core each after test ends, server continue to accept and even read request headers, but newer answer)
For ServerThreadPoolCount=16 500  concurrent connection always hangs (after some amount of requests server consume 0 CPU, accepts, read headers but newer answer)

Verified with -O3, the same with -O1 optimization flag

Offline

#31 2022-01-12 18:35:55

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Thanks for the feedback.

Please try with my latest commits of today.
I have fixed some issues, and tried to stabilize the process.

I am interrested also with HTTP/1.0 numbers, which should be much better now.

Could you try to enable logs, and see what is happening when it hangs.
(set aLogVerbose=true last parameter to THttpAsyncServer.Create)

I have your sample server, but how do you create the clients? Which tool do you use?

Edit: try to reduce SleepHiRes(500); into SleepHiRes(10); in line #2914 of mormot.net.async.pas.
Perhaps the async sending thread - from THttpAsyncServer.Execute - is not responsive enough. But I doubt it may be the culprit because small writes are involved in your test, so no async sending should be triggered.

Offline

#32 2022-01-13 08:54:26

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

With latest commit situation is the same sad
I still use wrk as a stress load tool - see "Load testing tool" thread for details, there is other tools @Eugene Ilyin notes in this post (Class D)

Problem can be reproduced for example on server with 16 thread using

~/dev/wrk/wrk -c 500 -t 8  http://localhost:8888/echo

After stress we can see (using `strace curl -v http://localhost:8888/echo`) what server accept request but don`t respond

And I got a stack trace in log!

EInvalidPointer {Message:"Invalid pointer operation"} [R0mormotHttpSvr] at 41a1f3 httpServerRaw.lpr  (82) 
../../src/net/mormot.net.sock.pas   mergependingevents (2265)
../../src/net/mormot.net.sock.pas   tpollsockets.pollforpendingevents (2464)
../../src/net/mormot.net.async.pas tpollasyncreadsockets.pollforpendingevents (949)
../../src/net/mormot.net.async.pas tasyncconnectionsthread.execute (1625)

Last edited by mpv (2022-01-13 08:55:29)

Offline

#33 2022-01-13 13:08:15

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Just pull and verify HTTP1.1 kep-alive scenario - now server is stable Even with 10000 concurrent connections smile

Offline

#34 2022-01-13 13:16:59

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

I will verify a HTTP 1.0 today evening, but it's a little hard to do - we quickly run out of sockets (whey are remains in "closed" state) - this is my main problem currently in mORMot1 without keep-alive.
Even with such OS tuning, behind NGINX (nginx is very efficiently use connections to back-end), moving all static no nginx etc. max number of actively working users for one IP address is up to 10000. So we need to run several servers on several IPs to handle more. Lucky me I have only one client with huge user base.

Offline

#35 2022-01-13 13:55:03

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

I guess there was an allocation problem in MergePendingEvents().
Should be fixed by https://github.com/synopse/mORMot2/commit/0c15b057
You last test was already including this fix, so it is why you found it stable now. wink

About your max 10,000 connections, if you use unix socket (and not TCP) between nginx and the mORMot server, you won't suffer from port exhaustion.
The only limit is the OS number of file descriptors, which can be huge.
And from my tests, unix sockets are faster and safer than TCP over the loopback.

I have also made some fixes and enhancements.
Especially https://github.com/synopse/mORMot2/commit/6f735a13 which seems to fix some problems I discovered with TTestMultiThreadProcess.
Now TTestMultiThreadProcess pass with no issue with 25,0000 concurrent clients:

1.1. Multi thread process: 
  - Create thread pool: 1 assertion passed  666us
  - TCP sockets: 599,833 assertions passed  2m28
     1=12329/s  2=5210/s  5=842/s  
   MaxThreads=5 MaxClients=25000 TotalOps=150000 TotalClients=40000
  Total failed: 0 / 599,834  - Multi thread process PASSED  2m28

(numbers with logs enabled, which generates 830MB of log useful to track performance and stability problems)
Here there was some clients which were rejected as "too many connections", but the mORMot REST client did retry after a while and eventually succeed. As they should in the real world.

From my point of view, the server seems stabilized now.
Any more feedback and numbers are welcome! big_smile

And... please ensure you include mormot.core.fpcx64mm.pas because it has been found to be very efficient in practice.
Ensure you defined -dFPC_X64MM and -dFPCMM_SERVER for your project.

Offline

#36 2022-01-13 15:34:48

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

I confirm there is no problem of huge number of concurrent connections using Unix Domain Sockets.

If I first run

ulimit -H -n

then I can run the following tests:

 1.1. Multi thread process: 
  - Create thread pool: 1 assertion passed  7.37ms
  - Unix domain sockets: 1,918,959 assertions passed  1m12
     1=17764/s  2=16365/s  5=19613/s  
   MaxThreads=5 MaxClients=80000 TotalOps=480000 TotalClients=128000
  Total failed: 0 / 1,918,960  - Multi thread process PASSED  1m12

So here 80,000 concurrent clients with HTTP/1.1 keep alive stream, with no issue of REST Add/Retrieve process on ORM + SQLite3 database, and perfect scaling.
We can see that its scales much better than TCP sockets, from the numbers on my previous post.
Such numbers were impossible to reach with mORMot 1 server.

Offline

#37 2022-01-14 10:59:44

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Yes, you are right about UDS and my mORMot1 socket problems...

I can confirm enabling fpcx64mm improve performance - for test case above +12% RPS what is very good..

Tries to test HTTP1.0 mode (just enable nginx as a reverse proxy) and found in both 1.0 and 1.1   mORMOot return ~2000 responses and hangs/ Will continue to investigate later (evening) to give more info
nginx config is:

upstream mormot2 {
    server localhost:8888 max_fails=2 fail_timeout=30;
    keepalive 32;
}

server {
    listen       82;
    server_name localhost;
    # prevent nginx version exposing in Server header
    server_tokens off;
    # Enable gzip compression.
    # Default: off
    gzip on;
    location /echo {
      proxy_pass http://mormot2;
      # proxy_http_version 1.1;
    }
    location /pure {
      return 200 'gangnam style!';
    }
}

Test command:

wrk -c 100 -t 16  http://localhost:82/echo

Last edited by mpv (2022-01-14 11:00:52)

Offline

#38 2022-01-14 12:23:32

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Additional info:
I disable x64mm and now (if behind nginx) got AV very quickly (AV not logged into file, only on console)

0220114 12173421  ' trace mormot.net.async.THttpAsyncConnection(7f211a3574c0) OnRead hrsGetCommand len=56 GET /echo HTTP/1.0$0d$0aHost: mormot2$0d$0aConnection: close$0d$0a$0d$0a
20220114 12173421  ( trace mormot.net.async.THttpAsyncConnection(7f211a357640) AfterWrite Done ContentLength=31 Wr=0 Flags=pc
20220114 12173421  ( trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) Write 2d closed by AfterWrite handle=36
20220114 12173421  ( trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) UnlockSlotAndCloseConnection: Write() finished on handle=36
20220114 12173421  ( trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) Stop sock=2d handle=36 r=2 w=0
20220114 12173421  ( trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) OnClose {"THttpAsyncConnection(7f211a357640)":{Handle:36}}
20220114 12173421  2 trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) Start sock=2d handle=36
20220114 12173421  2 trace mormot.net.async.THttpAsyncConnections(7f211ae3e040) Execute: Accept(8888)=20220114 12173421  2 warn  mormot.net.async.THttpAsyncConnections(7f211ae3e040) Execute raised uncatched EAccessViolation -> terminate mormotHttpServer
20220114 12173421  2 info  mormot.net.async.THttpAsyncConnections(7f211ae3e040) Execute: done AW mormotHttpServer
20220114 12173421  2 info  mormot.net.async.THttpAsyncConnections(7f211ae3e040) Execute: State=esFinished
20220114 12173421  ' trace mormot.net.async.THttpAsyncConnection(7f211a3574c0) Write len=151 HTTP/1.0 200 OK$0d$0aServer: mORMot2 (Linux)$0d$0aX-Powered-By: mORMot 2 synopse.info$0d$0aContent-Length: 31$0d$0aConnection: Close$0d$0a$0d$0agot request from connection #36
20220114 12173421  ' trace mormot.net.async.THttpAsyncConnection(7f211a3574c0) AfterWrite Done ContentLength=31 Wr=0 Flags=pc
20220114 12173421  ' trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) Write 2c closed by AfterWrite handle=35
20220114 12173421  ' trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) UnlockSlotAndCloseConnection: Write() finished on handle=35
20220114 12173421  ' trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) Stop sock=2c handle=35 r=2 w=0
20220114 12173421  ' trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) OnClose {"THttpAsyncConnection(7f211a3574c0)":{Handle:35}}

Offline

#39 2022-01-14 12:55:22

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

And if you disable the verbose logs?

Offline

#40 2022-01-14 13:03:21

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

If verbose log is disabled problem exists. I think I understand the reason - see this lines

20220114 12173421  ( trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) Stop sock=2d handle=36 r=2 w=0
20220114 12173421  ( trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) OnClose {"THttpAsyncConnection(7f211a357640)":{Handle:36}}
20220114 12173421  2 trace mormot.net.async.TAsyncConnectionsSockets(7f211b12f340) Start sock=2d handle=36

The same sock 2d used twice. This may happens because nginx aggressively reuse connections..

Offline

#41 2022-01-14 13:32:37

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

What I don't understand is that the handle should be unique, even if the socket is reuse, which should be not a problem here.
The issue is that handle=36 is for both Stop and Start...

Is there a "Start sock=2d handle=36" really twice in your log?

Could you send me a bigger log content?

Offline

#42 2022-01-14 14:10:14

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

I sure the same handle is twice. I repeat tests - see 3e

20220114 14062029  ' trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Write 3b closed by AfterWrite handle=50
20220114 14062029  ' trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) UnlockSlotAndCloseConnection: Write() finished on handle=50
20220114 14062029  ' trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Stop sock=3b handle=50 r=2 w=0
20220114 14062029  ' trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) OnClose {"THttpAsyncConnection(7f81914dbb40)":{Handle:50}}
20220114 14062029  2 trace mormot.net.async.THttpAsyncConnections(7f8191fc1040) ConnectionNew {"THttpAsyncConnection(7f81914dbe40)":{Handle:52}} socket=3d count=0
20220114 14062029  2 trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Start sock=3d handle=52
20220114 14062029  2 trace mormot.net.async.THttpAsyncConnections(7f8191fc1040) Execute: Accept(8888)={"THttpAsyncConnection(7f81914dbe40)":{Handle:52}}
20220114 14062029  - trace mormot.net.async.TPollAsyncReadSockets(7f8192051240) GetOnePending(R7 mormotHttpServer)=7f81914dbe40 1 #1/1
20220114 14062029  ( trace mormot.net.async.THttpAsyncConnection(7f81914dbcc0) Write len=151 HTTP/1.0 200 OK$0d$0aServer: mORMot2 (Linux)$0d$0aX-Powered-By: mORMot 2 synopse.info$0d$0aContent-Length: 31$0d$0aConnection: Close$0d$0a$0d$0agot request from connection #51
20220114 14062029  2 trace mormot.net.async.THttpAsyncConnections(7f8191fc1040) ConnectionNew {"THttpAsyncConnection(7f81914dbfc0)":{Handle:53}} socket=3e count=0
20220114 14062029  - trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) ProcessRead recv(3d)=Ok len=56 in 4us {"TPollAsyncReadSockets(7f8192051240)":{PendingCount:1}}
20220114 14062029  - trace mormot.net.async.THttpAsyncConnection(7f81914dbe40) OnRead hrsGetCommand len=56 GET /echo HTTP/1.0$0d$0aHost: mormot2$0d$0aConnection: close$0d$0a$0d$0a
20220114 14062029  ( trace mormot.net.async.THttpAsyncConnection(7f81914dbcc0) AfterWrite Done ContentLength=31 Wr=0 Flags=pc
20220114 14062029  * trace mormot.net.async.TPollAsyncReadSockets(7f8192051240) GetOnePending(R8 mormotHttpServer)=7f81914dbfc0 1 #1/1
20220114 14062029  * trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) ProcessRead recv(3e)=Ok len=56 in 4us {"TPollAsyncReadSockets(7f8192051240)":{}}
20220114 14062029  * trace mormot.net.async.THttpAsyncConnection(7f81914dbfc0) OnRead hrsGetCommand len=56 GET /echo HTTP/1.0$0d$0aHost: mormot2$0d$0aConnection: close$0d$0a$0d$0a
20220114 14062029  * trace mormot.net.async.THttpAsyncConnection(7f81914dbfc0) Write len=151 HTTP/1.0 200 OK$0d$0aServer: mORMot2 (Linux)$0d$0aX-Powered-By: mORMot 2 synopse.info$0d$0aContent-Length: 31$0d$0aConnection: Close$0d$0a$0d$0agot request from connection #53
20220114 14062029  * trace mormot.net.async.THttpAsyncConnection(7f81914dbfc0) AfterWrite Done ContentLength=31 Wr=0 Flags=pc
20220114 14062029  * trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Write 3e closed by AfterWrite handle=53
20220114 14062029  * trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) UnlockSlotAndCloseConnection: Write() finished on handle=53
20220114 14062029  * trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Stop sock=3e handle=53 r=2 w=0
20220114 14062029  * trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) OnClose {"THttpAsyncConnection(7f81914dbfc0)":{Handle:53}}
20220114 14062029  - trace mormot.net.async.THttpAsyncConnection(7f81914dbe40) Write len=151 HTTP/1.0 200 OK$0d$0aServer: mORMot2 (Linux)$0d$0aX-Powered-By: mORMot 2 synopse.info$0d$0aContent-Length: 31$0d$0aConnection: Close$0d$0a$0d$0agot request from connection #52
20220114 14062029  ( trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Write 3c closed by AfterWrite handle=51
20220114 14062029  ( trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) UnlockSlotAndCloseConnection: Write() finished on handle=51
20220114 14062029  ( trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Stop sock=3c handle=51 r=2 w=0
20220114 14062029  ( trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) OnClose {"THttpAsyncConnection(7f81914dbcc0)":{Handle:51}}
20220114 14062029  - trace mormot.net.async.THttpAsyncConnection(7f81914dbe40) AfterWrite Done ContentLength=31 Wr=0 Flags=pc
20220114 14062029  - trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Write 3d closed by AfterWrite handle=52
20220114 14062029  - trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) UnlockSlotAndCloseConnection: Write() finished on handle=52
20220114 14062029  - trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Stop sock=3d handle=52 r=2 w=0
20220114 14062029  - trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) OnClose {"THttpAsyncConnection(7f81914dbe40)":{Handle:52}}
20220114 14062029  2 trace mormot.net.async.TAsyncConnectionsSockets(7f81922b2340) Start sock=3e handle=53
20220114 14062029  2 trace mormot.net.async.THttpAsyncConnections(7f8191fc1040) Execute: Accept(8888)=20220114 14062029  2 warn  mormot.net.async.THttpAsyncConnections(7f8191fc1040) Execute raised uncatched EAccessViolation -> terminate mormotHttpServer
20220114 14062029  2 info  mormot.net.async.THttpAsyncConnections(7f8191fc1040) Execute: done AW mormotHttpServer
20220114 14062029  2 info  mormot.net.async.THttpAsyncConnections(7f8191fc1040) Execute: State=esFinished

you do not reproduce it?

Last edited by mpv (2022-01-14 14:12:23)

Offline

#43 2022-01-14 14:40:18

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Perhaps I understand what happened.

After a connection, TAsyncServer.Execute runs TPollAsyncSockets.Start which may delegate the whole connection process to a background thread.
In our case, the connection instance is actually deleted in this background thread BEFORE the connection is logged in Accept().... therefore a random GPF occurs.

Please check https://github.com/synopse/mORMot2/commit/335b2cb1

Offline

#44 2022-01-14 15:41:32

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Now it works, if I use wrk even not hangs,  but very slow (2000 RPS).

There is 60 second random delay, it can be reproduced if i send queries using curl into nginx by hands (Up Enter; Up Eneter in console) with command `curl -v  http://localhost:82/echo`

After 15 - 100 requests (randomly) server stop responding, think one minute and continue to respond - see time in line 4 (15:34) and 5 (15:35) below:

20220114 15340646  1 trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) UnlockSlotAndCloseConnection: Write() finished on handle=75
20220114 15340646  1 trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) Stop sock=54 handle=75 r=2 w=0
20220114 15340646  1 trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) OnClose {"THttpAsyncConnection(7f12a5219bc0)":{Handle:75}}
20220114 15340704  2 trace mormot.net.async.THttpAsyncConnections(7f12a5cfe040) ConnectionNew {"THttpAsyncConnection(7f12a5219d40)":{Handle:76}} socket=55 count=0
20220114 15350710  2 trace mormot.net.async.THttpAsyncConnections(7f12a5cfe040) ConnectionNew {"THttpAsyncConnection(7f12a5219ec0)":{Handle:77}} socket=56 count=0
20220114 15350710  & trace mormot.net.async.TPollAsyncReadSockets(7f12a5d8e240) GetOnePending(R1 mormotHttpServer)=7f12a5219d40 1 #1/2
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) ProcessRead recv(55)=Ok len=96 in 12us {"TPollAsyncReadSockets(7f12a5d8e240)":{PendingIndex:1,PendingCount:2}}
20220114 15350710  & trace mormot.net.async.THttpAsyncConnection(7f12a5219d40) OnRead hrsGetCommand len=96 GET /echo11 HTTP/1.1$0d$0aHost: mormot2$0d$0aConnection: close$0d$0aUser-Agent: curl/7.68.0$0d$0aAccept: */*$0d$0a$0d$0a
20220114 15350710  & trace mormot.net.async.THttpAsyncConnection(7f12a5219d40) Write len=151 HTTP/1.0 200 OK$0d$0aServer: mORMot2 (Linux)$0d$0aX-Powered-By: mORMot 2 synopse.info$0d$0aContent-Length: 31$0d$0aConnection: Close$0d$0a$0d$0agot request from connection #76
20220114 15350710  & trace mormot.net.async.THttpAsyncConnection(7f12a5219d40) AfterWrite Done ContentLength=31 Wr=0 Flags=pc
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) Write 55 closed by AfterWrite handle=76
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) UnlockSlotAndCloseConnection: Write() finished on handle=76
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) Stop sock=55 handle=76 r=2 w=0
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) OnClose {"THttpAsyncConnection(7f12a5219d40)":{Handle:76}}
20220114 15350710  & trace mormot.net.async.TPollAsyncReadSockets(7f12a5d8e240) GetOnePending(R1 mormotHttpServer)=7f12a5219ec0 1 #2/2
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) ProcessRead recv(56)=Ok len=96 in 9us {"TPollAsyncReadSockets(7f12a5d8e240)":{}}
20220114 15350710  & trace mormot.net.async.THttpAsyncConnection(7f12a5219ec0) OnRead hrsGetCommand len=96 GET /echo11 HTTP/1.1$0d$0aHost: mormot2$0d$0aConnection: close$0d$0aUser-Agent: curl/7.68.0$0d$0aAccept: */*$0d$0a$0d$0a
20220114 15350710  & trace mormot.net.async.THttpAsyncConnection(7f12a5219ec0) Write len=151 HTTP/1.0 200 OK$0d$0aServer: mORMot2 (Linux)$0d$0aX-Powered-By: mORMot 2 synopse.info$0d$0aContent-Length: 31$0d$0aConnection: Close$0d$0a$0d$0agot request from connection #77
20220114 15350710  & trace mormot.net.async.THttpAsyncConnection(7f12a5219ec0) AfterWrite Done ContentLength=31 Wr=0 Flags=pc
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) Write 56 closed by AfterWrite handle=77
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) UnlockSlotAndCloseConnection: Write() finished on handle=77
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) Stop sock=56 handle=77 r=2 w=0
20220114 15350710  & trace mormot.net.async.TAsyncConnectionsSockets(7f12a5fef340) OnClose {"THttpAsyncConnection(7f12a5219ec0)":{Handle:77}}

BTW trace is perfect thing!

Last edited by mpv (2022-01-14 15:42:51)

Offline

#45 2022-01-14 15:57:18

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Delay occurs only when proxy using HTTP 1.0, if proxied using HTTP 1.1 performance is ~70000 RPS.
Updated nginx config with 3 endpoint:
  - /echo proxy using http1.0 w\o keep-alive
- /echo11 - uses HTTP1.1 with keep alive (up to 1024 connection for each nginx worker)
- and (just to compare performance) /pure endpoint what respond 200 "gangnam style!' on nginx level

upstream mormot2 {
    server localhost:8888 max_fails=2 fail_timeout=30;
    keepalive 1024;
}

server {
    listen       82;
    server_name localhost;
    server_tokens off;

    location /echo {
       proxy_pass http://127.0.0.1:8888;
    }
    location /echo11 {
        proxy_pass http://mormot2;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
    location /pure {
        return 200 'gangnam style!';
    }
}

Last edited by mpv (2022-01-14 15:59:28)

Offline

#46 2022-01-14 18:50:15

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

Please try with my latest commits.
It should get rid of the delay now.

Edit: I discovered a regression about raw Network protocols (not http/ws) on Delphi / Windows.
I will fix it.
Edit 2: Should be fixed now with https://github.com/synopse/mORMot2/commit/e0aa2703

Offline

#47 2022-01-14 21:30:00

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: TCP (HTTP) server improvements for Linux

Test results:
- Server stable in case clients are HTTP 1.1, 1.0 and mix of HTTP1.1. and 1.0.
- I can't reproduce a delay anymore

But
HTTP1.0 mode still too slow ~1864 RPS

The strange thing is what in HTTP 1.0 mode server consume only small part of CPU (each core is about ~10%), while in HTTP 1.1 - 100%

I analyse a syscalls  - everything is OK

Try to profile via valgrind - and got another strange result - if sample profiler is active all responses are non 200, in verbose mode server start reporting

 warn  mormot.net.async.THttpAsyncConnections(04b36040) Execute: Accept(8888) failed as Too Many Connections

what is not true, because ss -s shows 3595 socket in use

PS
test result above is done before Edit2 added in post above

Edit2
with latest commit result is the same

Last edited by mpv (2022-01-14 21:36:52)

Offline

#48 2022-01-18 11:06:18

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,165
Website

Re: TCP (HTTP) server improvements for Linux

I spend days tracking - and fixing - several issues.

The main problem was that the sockets were not closed when epoll() was used!
Then I have discovered several difficult race condition bugs, when heavy multi-threading was involved.
Also enhanced the MaxPending value to 10000 - the previous 2000 did make the server reject connections on heavy benchmark.

Now I have no trouble with HTTP/1.1 or HTTP/1.0, with thousands of connections.
For instance, here are numbers with forcing HTTP/1.0, and 50 threads running 29980 requests, therefore connections:

 1.1. Multi thread process: 
  - Create thread pool: 1 assertion passed  3.03ms
  - TCP sockets: 119,921 assertions passed  4.18s
     1=11682/s  2=16590/s  5=18688/s  10=19351/s  30=14645/s  50=14223/s  
   MaxThreads=50 MaxClients=5000 TotalOps=29980 TotalClients=9800 HTTP1.0
  Total failed: 0 / 119,922  - Multi thread process PASSED  4.25s

Your feedback is welcome.

About the remaining socket errors above 1000 clients with WRK, I guess the problem rely on the WRK tool itself.
It first connect every client in a loop, then start working with the clients. This is not well handled by our server. I have no trouble with the ORM multi-thread client code, which uses a more regular pattern. My guess is that WRK behaves not like a regular set of clients, but like a DOS attacker. I consider the socket failure when bombing the server as a security feature, if more regular clients do connect with no problem.
I am only concerned about nginx as HTTP/1.0 reverse proxy. Perhaps in this case, the regular useHttpSocket web server should be defined instead of useHttpAsync - this is what is currently documented, by the way.

Offline

Board footer

Powered by FluxBB