You are not logged in.
Pages: 1
Good morning !
Considering the code ...
...
FServer := TWebSocketAsyncServer.Create('[::1]:12345', Nil, Nil, 'Acceptor', 2, 30000, []);
...
For every ServerThreadPoolCount > 1, when frames of any size are received at a rate of, say, two per second, CPU consumption increases substantially. There is no processing after receiving the frames. This does not happen when ServerThreadPoolCount = 1 !
This is normal ?
Offline
ServerThreadPoolCount = 1 is not a very stable parameter.
With a single thread, there is no pool of threads involved.
And a single processing thread will block all other processing requests, which is not a good practice.
Note that the async server itself generates two threads for itself (main Read and Write async loops) in addition to the thread pool.
There is no point of using the async server with ServerThreadPoolCount = 1.
Try with ServerThreadPoolCount = 4 or 8 for instance. But the default value of 32 is just fine.
If you really want a low number of threads, use the non-async server. There will be one thread per connected client. So for an idle server, it is just fine.
Online
Thanks for the quick feedback.
Well, the question is exactly this. When using a ServerThreadPoolCount > 1 (4, 8, or the default 32) CPU consumption increases substantially, up to 20%, for no processing no Delphi 11.3
Offline
Yes. I even just tested with the new stable version released today
Last edited by ec (2023-08-24 14:22:28)
Offline
Thanks to the httpServerRaw sample, I was not able to reproduce the high CPU consumption, but the performance was very low.
I just identified some issue on Windows.
On Linux, no such issue. With the socket-based (not async) server, no such issue.
This is a Windows specific issue for our async kernel.
If I enable the logs, the numbers are much better. So it is clearly a thread scheduling problem specific to Windows.
But no high CPU consumption. Unexpected low scaling, but low CPU usage.
Can you enable the logs and see what's happening on your project?
I am investigating.
In the meanwhile, either switch the server to Linux, or use the TWebSocketServer class.
Online
I understood. Anyway, thank you immensely for your attention. I will provide the logs as soon as possible.
Last edited by ec (2023-08-24 14:53:52)
Offline
With the hsoThreadSmooting option and some threads (e.g. 16), it scales much better.
It seems stable and performing well enough.
Without this option, the thread pool is not awaken after a while.
Edit:
After a few minutes running the "wrk" tool, even with the hsoThreadSmooting option, the server becomes unresponsive on Windows.
Then if we wait a little, it becomes responsive again.
Online
Adding hsoThreadSmooting option, with a thread pool of 16, receiving 4 frames of 50 bytes, per second, without any processing, consumption drops to 6%.
Offline
The hsoThreadSmooting option seems mandatory on Windows.
Note that the async server is more likely to be used with a lot of concurrent connections, on heavy server load.
Otherwise, the simple socket server is a better candidate, and consumes less resources at idle state.
To be sure, I have just deeply tested the async server on Linux, and it is very stable and scaling. Even with a few connections.
So it seems to be a Windows specific issue, which I must circumvent and fix.
Update:
The easiest is to force hsoThreadSmooting/acoThreadSmooting on Windows.
See https://github.com/synopse/mORMot2/commit/3b1b6dfa
Online
Good afternoon !
About the high CPU consumption, I tested it a little while ago and apparently it works correctly. See if that makes any sense to you...
Excessive context switching of threads is causing high CPU consumption.
At mormot.cor.windows.inc we have:
...
procedure SleepHiRes(ms: cardinal);
begin
if ms <> 0 then
Windows.Sleep(ms) // follow the HW timer: typically up to 16ms on Windows
else
SwitchToThread; // <-- Executed millions of times per minute
end;
...
At mormot.net.async.pas
...
procedure TAsyncConnectionsThread.Execute;
...
atpReadPoll:
// main thread will just fill pending events from socket polls
// (no process because a faulty service would delay all reading)
begin
{ start := 0;
} start := 1; // < -- My Test is this. It is the same as SleepHiRes(1)
while not Terminated do
...
Disregarding possible side effects, it works !!!
What do you think ?
Offline
Thanks a lot for the feedback.
It is a very interesting approach.
My only concern is that start := 1 could make a huge sleep.
Another potential side effect may be to reduce the reactivity of the server somewhat...
Perhaps this may be better:
https://github.com/synopse/mORMot2/commit/14c009c5
Could you please try it and report back here your findings?
Online
Good morning !
If so, it's much more efficient! Correct me if I'm wrong: -1 = INFINITE, which is the same as mormot.core.os.windows.inc.WaitFor(INFINITE), in the given snippet. Tested with thousands of operations.
Working perfectly, with minimal CPU consumption!!!
Thank you very much for your attention.
Offline
In fact, I did just commit the following:
https://github.com/synopse/mORMot2/comm … acc2c4ee8c
Which identifies start < 0 as a valid initial value (similar to 0), but without the SleepHiRes(0) state.
Online
Goodnight !
Ab, me, again. On the same subject!
...
procedure TAsyncConnectionsThread.Execute;
...
{$ifdef OSWINDOWS}
{ start := -1; // ensure never SleepHiRes(0)
} start := 1; // ec
{$else}
start := 0; // best reactivity
{$endif OSWINDOWS}
...
After two hours of testing (30 clients), with the start value = -1, the following situation occurs:
-One of the connections stops working. The other 29 are still working;
-No new connections to the server are possible;
-CPU consumption goes up to 50% or more.
Changing the value of start to 1, and adding another 70 clients, no problem occurs and, strangely, one has the impression of better performance and stability!
What do you think ?
Last edited by ec (2023-08-29 21:32:10)
Offline
Please try https://github.com/synopse/mORMot2/commit/f41ee356
I have rewritten the async server loop.
No start/sleep any more.
It sounds cleaner and more stable on my end - both for Linux and Windows.
Your feedback is welcome!
Online
Good morning !
With this change, a single connection causes high CPU consumption!
It appears to be working normally again with the following:
...
fWaitForReadPending := true; // should be set before wakeup
if new <> 0 then
fOwner.ThreadPollingWakeup(new)
Else Begin
fEvent.WaitFor(1); // ec **
Continue;
End;
...
Offline
It didn't stabilize! It stopped working, with the same symptoms as before, after an hour of testing
Offline
Testing like this now
...
atpReadPoll:
// main thread will just fill pending events from socket polls
// (no process because a faulty service would delay all reading)
...
if new <> 0 then
fOwner.ThreadPollingWakeup(new)
else
begin
new := fOwner.fClients.fRead.PollForPendingEvents(ms);
If new = 0 Then Begin
TThread.Sleep(1); // Without this things don't work, anyway
Continue;
End;
end;
...
Last edited by ec (2023-08-30 14:05:13)
Offline
Okay Ab.
I got better results as follows:
fEvent.ResetEvent;
fWaitForReadPending := true; // should be set before wakeup
if new <> 0 then
fOwner.ThreadPollingWakeup(new)
else
{ ec...
} begin
new := fOwner.fClients.fRead.PollForPendingEvents(ms);
If new = 0 Then Begin
TThread.Sleep(1);
Continue;
End;
fOwner.ThreadPollingWakeup(new);
end;
Or simply ...
atpReadPoll:
// main thread will just fill pending events from socket polls
// (no process because a faulty service would delay all reading)
begin
fWaitForReadPending := false;
{ new := fOwner.fClients.fRead.PollForPendingEvents(ms);
} new := fOwner.fClients.fRead.PollForPendingEvents(ms + 1100); // << it works too
I hope it helps !
Thank you very much for your attention.
Last edited by ec (2023-08-30 16:41:07)
Offline
About the CPU consumption, see
https://github.com/synopse/mORMot2/commit/828117bc
About the random shutdown of the server threads, see
https://github.com/synopse/mORMot2/commit/060d5be0
The last one was really tricky to identify.
But since I was able to reproduce it after bombarding a Windows VM with a few cores from my Linux PC using wrk, I was able to find the root cause.
One-liner fix for a lot of investigation time!
Now the async server seems pretty stable to me, and I have tested it on Windows with 1000 concurrent connections, and fast or slow process, with no stability problem.
We will probably make a release including this fix in the next days.
Online
Good morning !
What excellent news, Ab. Our team thanks you immensely for all your attention and effort.
Offline
Hello @ab
I still got a high CPU on the Async web socket server with 1500+ concurrent connections for 26+ hours
I was checking the server status and I found out that the server CPU was 90+%
When I checked the threads it was all from mORMot Async server threads
Each thread was using around 5..6% and on the stack, every single one of them was doing SwitchToThread
I'll try to use something like ProDelphi64 or if you have any recommendations on how to trace this down I'll be happy to hear from you.
or try to run it under the IDE debugger for as long as I can.
I'm using mORMot2 v2.1.5824, Delphi 11.3 and the server is running on a Windows
Last edited by Coldzer0 (2023-09-02 00:37:09)
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
With 1500 concurrent connections, I would rather switch to Linux, if it is possible.
It is much more stable.
And even with a small VM, you will have much better performance.
Or switch to the http.sys server, if you really need Windows.
In practice, on Windows, the async server still use one or several "select", groups by FD_SETSIZE = 512 items.
You may try to change FD_SETSIZE to 2000 in mormot.net.sock.windows.inc, and see if it helps.
How many threads are affected?
You may try to add:
procedure THttpAsyncConnections.IdleEverySecond;
begin
ConsoleWrite('conn=% pending=% awake=%', [
fConnectionCount, fClients.fRead.fPending.Count, fThreadPollingAwakeCount]);
// GC of connection memory
inherited IdleEverySecond;
Which may help a little to understand what's happening.
Online
I tried to reproduce the problem on my side, but even with 1500 connections, I was not able to trigger your issue:
ab@dev-ab:~/dev/lib2/test$ wrk -c 1500 -t 12 -d 1220 http://192.168.0.233:8888/echo
Running 20m test @ http://192.168.0.233:8888/echo
12 threads and 1500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 12.32ms 3.00ms 248.12ms 73.82%
Req/Sec 10.21k 1.42k 62.69k 70.03%
148684428 requests in 20.33m, 24.93GB read
Requests/sec: 121862.88
Transfer/sec: 20.92MB
It was testing a Windows VM with 4 cores from my Linux host.
The average CPU usage was around 30% in Windows.
Here is a screenshot with 2000 concurrent connections:
ab@dev-ab:~/dev/lib2/test$ wrk -c 2000 -t 12 -d 1220 http://192.168.0.233:8888/echo
Running 20m test @ http://192.168.0.233:8888/echo
12 threads and 2000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 16.82ms 4.19ms 323.14ms 78.89%
Req/Sec 9.93k 1.31k 15.89k 69.62%
144693563 requests in 20.33m, 24.26GB read
Requests/sec: 118591.88
Transfer/sec: 20.36MB
I observed no stability issue after more than 20 minutes of highly stressful requests with 2000 connections.
From my previous experiments, if it works for 20 minutes, with no lag nor huge memory or cpu consumption, bombing the server with 2000 concurrent connections in parallel with the wrk tool in 12 threads (my CPU has 20 threads, and 4 are reserved to the Windows VM), it should work for days.
Perhaps there is something else, but we need be able to reproduce it, and have at least more information about what is happening on your side (looking e.g. the logs, or at the ConsoleWrite output I proposed to add).
Also ensure that the deadlock does not occur in your own code. If the processing events have a race condition for instance, the processing threads may appear blocked, but it is not the web server culprit.
Online
Right Now I added the ConsoleWrite you mentioned
The server has been running for "12 hours +" and waiting to see what will happen
BTW I'm using "TWebSocketAsyncServer" and all the connections sending data all the time like ping and other small text chat data
For now, this is the network status.
And for the performance for now it uses only 88MB of RAM and less than 1% of CPU which is amazing.
I'll try to enable full logging after this test and share it with you.
This is what happens after 14 hours, plus the console is not updating anymore.
Last edited by Coldzer0 (2023-09-05 02:58:13)
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
conn=762 pending=1065 awake=15
indicates that
- there are 762 connections (seems fair enough)
- but 1065 TCP events are queued for process (which is wrong, because the events should be unqueud)
- and all 15 processing threads are currently running
There is something blocking in the R1..R15 threads.
Another weird thing is that there are 1065 events for 762 connections. There should not be any more events than connections, by design.
I have no clue yet about what is happening after 14 hours. I don't see any related overflow related to this time period.
I will try to reproduce it here.
You previously wrote that it was call the SwitchToThread API.
Of course, if we could attach the Delphi debugger to the executable, it could be great.
The logs could help, too. Ensure you have the rotation enabled, to avoid too big a log file. And ensure you have enabled the async verbose logs, if you can.
Could you try to compile the server with FPC and see what happens on Linux?
Online
So I noticed yesterday that I was not enabling the Windows firewall
Then After enabling it and I already have a list of IPs to block they were doing DDOS attacks on the server
the server still running and I blocked more IPs
Here's an example The number of WebSocket connections that I counted was way less than the console shows
Then I checked the list of connected IPs and found 1 IP connected to the server 200+ times, I blocked the IP and a couple of other malicious IPs
The number decreased after killing the multiple connections
And I noticed in the screenshots I sent before that there was a spike in network data up to 2GB in traffic which is odd.
I'll keep the server running with the firewall and list of blocked IPs and see if it affects the result of the test.
Then I'll do another test with verbose enabled.
Now I have a question regards the TCP connections in the "TWebSocketAsyncServer"
Can I intercept the connection at the beginning even before the WebSocket upgrade?
I want to make some internal checks to test for any multi-connected IPs or other malicious IPs.
Could you try to compile the server with FPC and see what happens on Linux?
I can't with this server because it is built for Windows with a GUI controller for status and some other options.
But I was already working on a new rewrite for the project to be compatible for Linux and with REST server without the GUI stuff and only APIs.
Last edited by Coldzer0 (2023-09-06 17:30:10)
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
Perhaps you may try to put a nginx proxy as front-end for the Internet.
With TWebSocketAsyncServer, you have the OnBeforeBody event to setup your callback, as soon as possible. It is implemented in the THttpAsyncServer parent.
Usually, we check for the Bearer presence, and its validity. Over TLS, you may just put some fixed secret as Bearer. Or a JWT could be just fine.
And also some optional IP banning mechanism: if you set the hsoBan40xIP option, you can ban an IP for a few seconds once there is a 40x error returned by the server.
It is naive, but efficient against DoS attacks.
Online
I just finished a 6 days tests about the Async web server stability.
This was on Linux.
It ran on our synopse.info cheap server - but with 8 Atom threads anyway:
Running 8333m test @ http://localhost:8888/echo
2 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 7.45ms 2.96ms 427.32ms 91.74%
Req/Sec 17.69k 826.25 19.85k 89.72%
17596630420 requests in 8333.34m, 2.91TB read
Requests/sec: 35193.25
Transfer/sec: 6.11MB
The memory was around 15MB the whole time, and wrk took 200% of CPU, whereas the mORMot server took 250% of CPU (in htop).
As we can see, results seems pretty consistent and stable, even on a server also doing regular php/mysql process in the background.
Here are the stats of our x86_64 memory manager during the process:
Flags: SERVER assumulthrd erms debug repmemleak
Small: 2K/219KB including tiny<=128B arenas=8 fed from Medium
Medium: 6MB/391MB peak=83MB current=5 alloc=313 free=308 sleep=23
Large: 0B/640KB peak=320KB current=0 alloc=2 free=2 sleep=0
Total Sleep: count=23
Small Blocks since beginning: 56G/5TB (as small=44/46 tiny=49/56)
112=18G 96=18G 64=18G 224=1G 240=84M 2176=8M 208=8M 480=8M
256=241K 352=204K 272=168K 288=145K 192=139K 304=124K 176=124K 320=95K
So 5TB of allocated memory, as very small blocks (64-112 bytes). Only 23 contentions, and for the Medium area only, which occurred when the initial memory blocks were allocated. No contention during the actual HTTP requests process.
The top memory consumption was 83MB, not bad, knowing we tested with up to 30,000 concurrent connections. At idle, it is around 5MB of RAM. And during the 500 connections tests, it was stable at 15MB.
I also run some quick test with 10,000 concurrent connections:
user@sd:~$ wrk -c 10000 -d 60 -t 4 http://localhost:8888/echo
Running 1m test @ http://localhost:8888/echo
4 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 195.04ms 131.06ms 1.69s 96.21%
Req/Sec 13.15k 2.47k 16.32k 94.27%
3125532 requests in 1.00m, 542.68MB read
Requests/sec: 51982.62
Transfer/sec: 9.03MB
Not bad on a very cheap hardware, i.e. the START-2-M-SSD bare metal server from https://www.scaleway.com/fr/tarifs/?tag … ibox-start
On my Core i5-13400 PC, I reach 1,500,000 requests per second. And on high-end TFB hardware, we maximize the network adapter, with other best frameworks, at 7,000,000 requests per second. .
Almost no one needs so much performance, because the network would be the bottleneck for sure, but it is fine seing our little mORMot could rival with the best
Online
The last test I'm doing right now has the hsoBan40xIP and hsoLogVerbose options in the server
And I'm doing a LOG_VERBOSE for the TSynLog.Family.Level
Right now the server has been running for 13+ hours and the server status seems stable
If the server reacts the same as before I'll send you the full server logs in a private message.
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
After doing some testing with the debugger.
Here's the first error "Access Violation" at the "IsBanned" function.
It only happens when I enable "hsoBan40xIP"
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
So after running for a while, the high CPU thing happens, and here's some call Stackframe for a couple of threads, and all of the high CPU ones stuck either at
Procedure TRWLock.WriteLock;
or
procedure TRWLock.ReadOnlyLockSpin;
and all of them come from
TAsyncConnections.ConnectionFind
TAsyncConnections.ConnectionFindAndLock
Here's screenshots
and
I'll try to check with fewer threads to see if I can catch what is happening.
Last edited by Coldzer0 (2023-09-15 11:07:08)
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
About the GPF with hsoBan40xIP
see https://github.com/synopse/mORMot2/commit/e62c8e82
About the dead lock, your stack traces did help a lot.
It was a websocket-only issue - so I could not reproduce it from regular HTTP testing.
It sounds like if the socket was broken while the ping was sent, and the server closes the connection and try to delete it from its list, so calls WriteLock, but it is already locked as ReadOnlyLock by the caller in TAsyncConnections.IdleEverySecond...
Please try https://github.com/synopse/mORMot2/commit/78ca8ca4
Thanks a lot for the very good debugging and detailed information!
Online
The GPF with the hsoBan40xIP fix has worked fine till now.
I've been testing it for 30 mins since you pushed the commit. I'll keep testing.
And for the lock, I just pulled the changes and will test it and see if everything works fine.
Thanks for your hard work and updates.
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
I have a question regarding exceptions at functions like
function TWebProcessInFrame.GetHeader: boolean;
Do we need exceptions here?
Like
EWebSockets.CreateUtf8('%.GetFrame: truncated mask', [process]);
Can't we return false on errors because exceptions like these hang the Debugger while testing?
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
@ColdZer0, is everything still ok?
This is what happens after 14 hours, plus the console is not updating anymore.
I am asking because i ran into the same issue in the end of july while testing the new async server, and I want to give another try. I had no junk network traffic as the network is filtered only on known static ips but almost all clients are connecting through 4G routers with sometime high latency and/or random disconnections. The console was "stuck" after ~6h with around 170 clients sending 1 or more requests per secs.
Thanks you both for the debugging and fixes
Offline
@ColdZer0, is everything still ok?
This is what happens after 14 hours, plus the console is not updating anymore.
I am asking because i ran into the same issue in the end of july while testing the new async server, and I want to give another try. I had no junk network traffic as the network is filtered only on known static ips but almost all clients are connecting through 4G routers with sometime high latency and/or random disconnections. The console was "stuck" after ~6h with around 170 clients sending 1 or more requests per secs.
Thanks you both for the debugging and fixes
For me, everything still works very fine ( the server has been running for 5 days 16 hours - from the last update I did to the server code not from the last WebSocket fix )
So everything works very well.
What kind of Async server you are using? WebSocket ? or just HTTPAsync Server?
Because the problem with the WebSocket one has nothing to do with the HTTPAsync Server.
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
Good to hear that.
It was an instance of a bidir TWebSocketAsyncServerRest. I will recompile the project next week and upload all the test clients while keeping the server running in the debugger.
Offline
Good to hear that.
It was an instance of a bidir TWebSocketAsyncServerRest. I will recompile the project next week and upload all the test clients while keeping the server running in the debugger.
Then yes your sever was definitely affected by the bug, and recompiling it with the latest update will solve it.
Mac, Windows, Linux
FPC Trunk, Lazarus Trunk, Delphi 12.x Latest
Offline
Pages: 1