You are not logged in.
Digging deeper, I found what with new implementation we got better result with lower total threads. This is very good - I expect db will be faster. Moreover - 7th*14server is better when 14th*7server.
I will investigate more.
BTW - TFB guys plane to begin setup a NEW servers with Ubuntu 22.04 this week (see https://github.com/TechEmpower/Framewor … 449036999), so we have some time...
Smoke test on server shows +4% (+50k) RPS for /json for new HASEVENTFD. I'll check in more detail later.
Can you, pls, commit blocking mode in separate brunch - I want to play with it - seams what it very depends n workers count
In mORMot1 I upload such huge files by manually split it on chunks (in my case - in browser) and send chunks one-by-one to the server endpoint with chunk number in URL htp://.../fileUpload?chunk=x&totalChunks=y
On the server side endpoint just append HTTP request body what contains chunk to the file.
The same can be used with mORMot2
I tried it yesterday (uncomment the line #1981) - on server HW numbers is slight lower compared to Events based algo.
We can reduce syscalls by using blocking IO for eventFD. And remove fOwner.fThreadPollingEventFD.WaitFor(5000) at all.
Blocked read() call should be terminated by signal (and return -1), when application (deamon) stops.
I tried this approach but threads locks somewhere.....
@ttomas - thanks for investigation. At last now we know it didn't help. BTW I investigate jemalloc with the same result. So for now we can consider glibc mm is good enough
@dcoun - this round (started 2023-02-23) should be the same as previous, because our last MR with minor improvements is not merged yet.
@ab - I investigating a pipe (simple program with one writer, what write pointer into pipe and many readers concurrently read it) - performance depends very much on kernel version. For old kernel (4.18 as on my server) it's terrible. On never (5.18 as on my desktop) it`s faster a lot (x10 times). Still planning to try eventfd + ring buffer. Will post a gists when finish.
For a long time I use DBeaver as a DB tool (I works with many different RDBMS), but last 3 years I use DataGrip (as a plugin for WebStorm where I spend a lot of time with JS). As all JetBrains products it's very very good and 100% worth the money. 3 years ago it cost me 27$, now price is high.
Can't help with mysql because do not use it..
I will try to include eventfd this weekend
Thank you very match!
I will made a new PR to TFB based on today`s sources with all minor changes and clean-upped raw.pas (w\o enabling smoothing). This give us a clean picture for comparison of event vs eventfd in future.
P.S.
Ready #7944
Yes, I plays with ThreadPollingWakeupLoad + hsoThreadSmoothing (4, 8, 16 32). A best s for for ThreadPollingWakeupLoad = 8. In best case(if on TFB HW boots will be near the same) we got +150 points, but I still not sure Smoothing is a good solution in terms of "realistc". So my propose is to enable it when 150 points meters, not now...
w\o tpw4 tpw8 tpw16 tpw32(m)
json 1338 754 1360 205 1457 470 1406 111 1163 232
rawdb 470 228 471 954 477 038 476 576 474 571
db 460 406 455 805 460 086 456 671 455 382
fortunes 386 097 388 822 392 342 387 425 387 932
rawfort 437 250 435 545 436 378 438 344
plaintext 4238 095 4125 256 4353 065 4249 592 4223 741
rawQuery 45 925 45 809 45 819
queries 34 712 34 658 34 826I can try to use eventfd on weekend, or, if you plane to try it by yourself (mormot async code is still al little complex for me), please notify me.
I don't get why the 20-query decreased so much
all frameworks decrease in the same way after adding Sync() - we not first.
When PG server is local the decrease is not so significant.
@ab - crazy idea - what if we use named pipes (mkfifo) to wake up a threads in pool instead of futex? Have you considered this option?
I mean TAsyncConnections creates a fifo pipe and write when data is ready, and all workers simply read from pipe in blocking mode? And let's kernel decide what worker wakes up by read successfully..
In this case we twice syscalls count (write + read instead of futex) but avoid loops over fThreads and simplify code a lot. Isn't it?
P.S.
Even simple pipe should work. And where is also a message queue if pipes is slow...
@squirrel - take a look into Postgre, really. It's faster, better supported by mORMot and have some KILLER features (mostly because of very extendable architecture), missed in other DB's. I use FTS (ts_vector), trigrams, PostGIS, timescaleDB with great success...
Results are ready - we still #16. Impressive plaintext improvements, rawqueries (with PG Sync) worse than I expected, small improvement for json and db (I think because of -O4)
Weights 1.000 1.737 21.745 4.077 68.363 0.163
Composire # JSON 1-query 20-query Fortunes Updates Plaintext Weighted score
38 mormot 731,119 308,233 19,074 288,432 3,431 2,423,283 3,486 2022-10-26 - 64 thread limitation
43 mormot 320,078 354,421 19,460 322,786 2,757 2,333,124 3,243 2022-11-13 - 112 thread (28CPU*4)
44 mormot 317,009 359,874 19,303 324,360 1,443 2,180,582 3,138 2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 mormot 563,506 235,378 19,145 246,719 1,440 2,219,248 2,854 2022-12-01 - 112 thread (28CPU*4) CPU affinity
51 mormot 394,333 285,352 18,688 205,305 1,345 2,216,469 2,586 2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 mormot 859,539 376,786 18,542 349,999 1,434 2,611,307 3,867 2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 mormot 948,354 373,531 18,496 366,488 11,256 2,759,065 4,712 2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 mormot 957,252 392,683 49,339 393,643 22,446 2,709,301 6,293 2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 mormot 963,953 394,036 33,366 393,209 18,353 6,973,762 6,368 2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required, -O4 optimization @ab - you are right. Replacing TOSLightLock into TLightLock dose not change futex calls count. So, this is thread awaking. Will play with WakeupLoad later...
@ttomas - current round result for mormot /plaintext is ~7M - and this is 10Gb network limitation on TFB hw. I think we should focus on /json. If we speed up HTTP for json, this automatically speed up db related tests.
Is TPollSockets.GetOnePending the bottleneck?
Should we try to switch the fPending: TPollSocketResults structure into a truly lock-less algorithm?
I think - yes. It's not visible in profiler, but this is a syscalls stat (cleanupped a little) of 1667410 /json requests for 24 thread server
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- -------------------
40,34 242,887052 281 863994 163691 futex
19,11 115,090885 69 1667410 sendto
16,25 97,843747 58 1667499 15 recvfrom
9,66 58,190893 843346 69 nanosleep
0,14 0,826164 61 13415 epoll_wait <--- I play with event count - 256 is good enough
0,01 0,032096 63 504 epoll_ctl I think futex`es - is because of Lock; RTLCriticalSection and other friends in GetOnePending
Also tried @ttomas idea with different containers/threads settings for different tests.
After many attempts, I have achieved a small (100K) increase in /json, but the number of threads/instances is just magical (for 28 processors, 8 instances with 7 threads is the best). Also, separate containers do nothing for database-related tests. I suggest leaving everything as it is
Problem is what db layer expect `?` is a parameter placeholder.
For ExecuteInlined use inlined parameters syntax :(paramValue):
But to to prevent SQL injection and speed up excution better to execute query with parameters:
stmt := conn.NewStatementPrepared('insert into NEW_TABLE (ID, DESCRIPT) values (?, ?)', false, true)
stmt.Bind(1, 4);
stmt.Bind(2, 'I''m confused?');
stmt.ExecutePrepared;My bad about https://synopse.info/forum/viewtopic.ph … 095#p39095 - in X X 1 mode raw enables hsoThreadSmooting - and this is why we got so strange thread using distribution
Looking at benchmark_config.json from ntex (No1 in compose score from prev round), also asp.net.core, instead of using one docker/monolith app, I propose to create 3 docker/app, 1-json/plaintext, 2-orm, 3-raw. Compose score will aggregate best results from all 3 app. Also for json/plaintext you can use different No of threads then db tests, what is scale better.
Nice idea, I will investigate it!
Different thread counts for non-db test seams make sense. I'll try to determinate best and may be enable CPU affinity for them.
As I understand from python sources, DB container is restarted for each test(once for default and once for postgres-raw), and current test order is db->json->queries->updates->fortunes->plaintext is OK (it's important what updates is after db/queries). So I do not see a reason to create separate containers for db and rawdb (but will check dose it make sense)
About "tuning the dockers" - we can't do this from inside a container. Docker daemon is configured on the host.
P.S.
My understanding of docker (simplified):
Docker is a slim wrapper around Linux namespaces and cgroups (+AuFS as a read only FS with layers). So everything is executed on host, but executable links to the libraries from docker image. No magic at all ![]()
BTW this is a reason we can't use io_uring - as far as i understand TFB host machine is based on Ubuntu 18 and io_uring not supported there. TFB have a planes to migrate it to 22.04 in next round.
Several processes was my first attempt - one process with 6 listeners and 28 thread for each is better when 6 processes x28 thread (a think because of memory managment)
Switching to io_uring is too radical a change, IMHO. And as I see, the top 10 frameworks use epool...
For my opinion our current problem is what with the same number of threads increasing listening sockets count improves json performance
I catch a case, where performance difference is dramatically on both server and on my PC (for other therads/sockets count numbers more sockets also always win, but the difference is not so big)
Server (28 cores)
wrk -d 15 -c 64 --timeout 8 -t 28 "http://localhost:8080/json"
./raw 56 28 1 (1x56 thread) - 143K RPS
./raw 28 28 2 (2x28 thread) - 770K RPSMy PC (12 cores)
wrk -d 15 -c 64 --timeout 8 -t 12 "http://localhost:8080/json"
./raw 24 12 1 - 300K
./raw 12 12 2 - 500Kperf (for server) shows what:
in case of one socket most of the load is gone into only 2 thread R18080 R28080 - see flamergath_json_56_1
in case of two sockets load if distributed between many threads - see flamergath_json_28_2
I do not understand the reason of such behavior
P.S.
I also tried
- switching the PG protocol to binary mode - performance is the same
- sets HttpQueueLength to 0 - no changes in performance except small amount of sockets errors
Today I tries to found bottlenecks using perf, but w\o success..
Your changes about HTTP processing (nice idea!) on server gives:
+20k RPS for json
+60k RPS for pipelined plaintext
+3k for /db and /rawdb /rawfortunes
Cumulative /fortunes effect is +25K
Yes, unexpectedly..
I post an announcement for our work on freepascal forum. May be community generate more ideas for optimizations....
One of the goals of mormot.db.sql.postgres development was to participate in TechEmpower benchmark. as of today we are in TOP15 there. So yes - it's overhead is minimal.
All tests passed. Changes a tiny, сlose to a statistical error. I will take them for the next PR...
TFB PR #7926 is ready
- /queries and /updates for raw test case: Postgres SQL pipelining uses Bind->Exec->Sync as required by General Rule 7
- /plaintext: improved HTTP pipelining on mORMot level - added response buffering to minimize send syscall`s
- general optimization: use aggressive compiler optimization level (-O4)
I expect that with these changes we will be #12 ( +600 point for /plaintext and - 70 point for /rawqueries + some points because of -O4 for all endpoints)
So, the next goal is to be in TOP 10
I have some ideas but still investigating....
@ab - one request for future mORMot2 release - please, compress static in mormot2 release using tar.gz. 7z is not available in default distribution (on both Linux and Windows), and installing p7zip package, for example inside docker build as we do in TFB bench, just to decompress mormot2static.7z takes some time, and worst what it require internet connection - I sometimes have problem with Internet after missile attacks ![]()
PG pipelining implemented using Sync() as required by TFB rules. raw* performance decreases a little (from 50K to 46K for /rawqueries)
@ab, please merge https://github.com/synopse/mORMot2/pull/144 - I add Conn.CheckPipelineSync method and need it to prepare TFB PR
All TFB tests are passed
/plaintext results:
my PC: 1 271 496 -> 3 764 547
server: 2 621 195 -> 4 343 425 (near the same results should be on TFB hardware)
Perfect!
I need some time to solve PG Sync() and will prepare a PR
Will test it tonight. And try to add sync for pipelining..
Unfortunately, as noted here our PG pipeline optimization breaks General test requirements #7 rule:
"If using PostgreSQL's extended query protocol, each query must be separated by a Sync message"
I do not understand why this requirement exists for synchronous case, but we have what we have. So I will rewrite PG pipelining again ![]()
About valgrid - I can't see a bottleneck there (mostly because it works slow), but now I trues to use perf + flamegraph - this techniques show more details because of HI-frequency sampling.
What I fund for a moment is a way to optimize /plaintext in HTTP pipelining mode. Flamegraph for pipelined plaintext shows what most of the time we are in send syscall - see flamegrapj for ./raw 4 4 1 (clickable)
This is because in current HTTP pipelining implementation we do less recv syscalls (read several GET request in one syscall) but we can also buffer an output and do less send`s!
This can be done if HTTP state machine detect what for GET request with Connection: keep-alive and HTTP 1.1 where is some additional GET after `\r\n`
and when send`s buffered response either if send buffer is overflow (for example nodeJS buffers such responses in 64k buffer) or pipeline ends (no additional bytes read)
Test case for 2 pipelined HTTP query:
(echo -en "GET /plaintext HTTP/1.1\nHost: foo.com\nConnection: keep-alive\n\nGET /plaintext HTTP/1.1\nHost: foo.com\n\n"; sleep 10) | telnet localhost 8080currently produce 2 sendto(7, "HTTP/1.1... syscall, but can produce one.
Since HTTP pipelining should not be used in production for many reasons, this optimization can be done under hsoHttpPipelining flag. @ab - can you implement this idea?
P.S. - to become clickable flamegraph should be downloaded first and when opened in browser - click do not work when svg is opened in Google Drive preview
mROMot results is ready. In the end of round we will be #16 in composite score - mostly because of improved PG pipelining, what affect queries and updates on raw mode.
Weights 1.000 1.737 21.745 4.077 68.363 0.163
Composire # JSON 1-query 20-query Fortunes Updates Plaintext Weighted score
38 mormot 731,119 308,233 19,074 288,432 3,431 2,423,283 3,486 2022-10-26 - 64 thread limitation
43 mormot 320,078 354,421 19,460 322,786 2,757 2,333,124 3,243 2022-11-13 - 112 thread (28CPU*4)
44 mormot 317,009 359,874 19,303 324,360 1,443 2,180,582 3,138 2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 mormot 563,506 235,378 19,145 246,719 1,440 2,219,248 2,854 2022-12-01 - 112 thread (28CPU*4) CPU affinity
51 mormot 394,333 285,352 18,688 205,305 1,345 2,216,469 2,586 2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 mormot 859,539 376,786 18,542 349,999 1,434 2,611,307 3,867 2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 mormot 948,354 373,531 18,496 366,488 11,256 2,759,065 4,712 2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 mormot 957,252 392,683 49,339 393,643 22,446 2,709,301 6,293 2023-02-14 - 168 threads, cmem, inproved PG pipelining Also we will be:
- #1 for /queries in raw mode with PG pipelining. I hope everything was done correctly there
- #3 for /fortunes in ORM(orm=full) mode. Result improved by +30%, thanks to cmem. We are behind lithium(C++) where template engine is not used and xitca-web (rust)
- #2 for /db in ORM(orm=full) mode. A little (8000RPS) behind xitca-web
A possible way to optimize is to find a bottleneck in the acync HTTP server, which should improve json and plaintext
mORMot uses global lock for SQLITE3 updates, so problem not happens where.
I use my own TubSQLite3ThreadSafeConnection = class(TSQLDBConnectionThreadSafe) without global lock (it actually exists inside sqlite) and sets
TSQLDataBase.WalMode := true;
TSQLDataBase.BusyTimeout := 10000; // 10 secondsIn case of sqlite3_busy - this error is passed to user and on UI message "try again later" is displayed.
For sure, it's huge improvement. Currently I can't use sqlite3 on real productions where long transactions are commonplace (because of sqlite3_busy error), but hctree should solve this problem, and (at least where DB size < 500Gb) sqlite can be usable.
Using -O4 optimization level (never use it before because of "beware" notes) slightly increases performance (+44k for json for example) and pass all tests.
Also tries Whole Program Optimization - it's decrease executable size from 5Mb to 3Mb but without visible performance changes (compared to -O4)
@ab - how do you think - can we use -O4 for TFB (I'm afraid of accidental falls)?
327K RPS for /fortunes. Memory consumption is higher
Flags: BOOSTER assumulthrd smallpools perthrd erms
Small: 3K/309KB including tiny<=256B arenas=128 pools=95
Medium: 126MB/126MB sleep=2K
Large: 0B/640KB sleep=0
Total Sleep: count=2K
Small Getmem Sleep: count=1
288=1
Small Blocks since beginning: 244M/29GB (as small=42/46 tiny=1K/2032)
48=93M 112=39M 80=28M 128=18M 32=14M 96=9M 64=9M 160=4M
144=4M 256=4M 416=3M 880=3M 1264=3M 272=2M 1376=509K 960=488K
Small Blocks current: 3K/309KB
48=2K 64=426 352=200 32=87 128=80 112=73 80=48 96=21
192=14 416=8 576=7 880=7 288=6 736=5 672=4 160=4
Maximum resident set size (kbytes): 271852
Minor (reclaiming a frame) page faults: 77196
Voluntary context switches: 5309185
Involuntary context switches: 7768Memory usage statistic
//libc
Maximum resident set size (kbytes): 28896
Minor (reclaiming a frame) page faults: 12867
Voluntary context switches: 5888357
Involuntary context switches: 5049
//x64mm (NumTinyBlockArenasPO2 = 7)
Maximum resident set size (kbytes): 124380
Minor (reclaiming a frame) page faults: 44196
Voluntary context switches: 5220211
Involuntary context switches: 8087With NumTinyBlockArenasPO2 = 7 instead of 6 result is 327К
CPU load in user space is ~10% higher than when using libc in both cases
Flags: BOOSTER assumulthrd smallpools perthrd erms
Small: blocks=3K size=309KB (part of Medium arena)
Medium: 60MB/60MB sleep=15K
Large: 0B/640KB sleep=0
Total Sleep: count=15K
Small Getmem Sleep: count=4
288=4
Small Blocks since beginning: 239M/29GB (as small=42/46 tiny=1K/2032)
48=91M 112=38M 80=27M 128=18M 32=14M 96=9M 64=9M 144=4M
160=4M 256=4M 416=3M 880=3M 1264=3M 272=2M 1376=485K 960=475K
Small Blocks current: 3K/309KB
48=2K 64=427 352=200 32=87 128=79 112=73 80=48 96=21
192=14 416=8 576=7 880=7 288=6 160=5 736=5 624=4With arenas, bind to threadID fortunes result is 313K RPS - very close to 355K with libc. Congratulations!
Flags: BOOSTER assumulthrd smallpools perthrd erms
Small: blocks=3K size=309KB (part of Medium arena)
Medium: 51MB/51MB sleep=10K
Large: 0B/640KB sleep=0
Total Sleep: count=10K
Small Getmem Sleep: count=16
288=14 80=2
Small Blocks since beginning: 234M/28GB (as small=42/46 tiny=746/1008)
48=89M 112=37M 80=27M 128=18M 32=14M 96=9M 64=9M 144=4M
160=4M 256=4M 416=3M 880=3M 1264=3M 272=2M 960=465K 1376=464K
Small Blocks current: 3K/309KB
48=2K 64=426 352=200 32=87 128=80 112=73 80=48 96=21
192=14 416=8 576=7 880=7 288=6 736=5 160=4 672=4P.S.
sleeps count is increased, but overall speed - also
With FPCMM_BOOST result is 226К
Flags: BOOST assumulthrd smallpool erms debug
Small: blocks=3K size=309KB (part of Medium arena)
Medium: 13MB/13MB peak=13MB current=11 alloc=11 free=0 sleep=229
Large: 0B/640KB peak=640KB current=0 alloc=2 free=2 sleep=0
Total Sleep: count=229
Small Blocks since beginning: 157M/19GB (as small=43/46 tiny=112/112)
48=56M 112=25M 80=18M 128=12M 32=9M 96=6M 64=6M 160=3M
144=3M 256=3M 880=2M 416=2M 1264=2M 272=1M 448=277K 960=273K
Small Blocks current: 3K/309KB
48=2K 64=426 352=200 32=87 128=80 112=73 80=48 96=21
192=14 416=8 576=7 880=7 288=6 736=5 672=4 160=4 BTW - glibc MM for x64 by default uses arenas count = CPUcores*8
Just tried with commit/60024584 and FPCMM_BOOSTER - results is better - 243K RPS on fortunes (instead of 181)
Flags: BOOSTER assumulthrd smallpools erms
Small: blocks=3K size=309KB (part of Medium arena)
Medium: 43MB/43MB sleep=137
Large: 0B/640KB sleep=0
Total Sleep: count=137
Small Blocks since beginning: 180M/22GB (as small=41/46 tiny=466/496)
48=68M 112=28M 80=20M 128=14M 32=10M 96=7M 64=7M 160=3M
144=3M 256=3M 880=3M 416=3M 1264=3M 272=2M 960=310K 448=308K
Small Blocks current: 3K/309KB
48=2K 64=426 352=200 32=87 128=80 112=73 80=48 96=21
192=14 416=8 576=7 880=7 288=6 736=5 672=4 624=4By filtering valgrind profiling data using `mem` I found what the only valuable difference is self timing of _getmem and _freemem
x64mm: 0.69 0.46
glibc : 0.26 0.23
Unfortunately there is no changes in fortunes an all... New x64mm - 181K RPS (~90% of CPU in userspace, 10% - in kernel), glibc MM - 360K RPS (50% CPU in user space, 50% in kernel).
I checked syscalls - both MM do near the same amount of mmap and munmap.
And I do not see anything strange in valgrind ![]()
If you need some addition help - please, tell me...
P.S.
compiled with -dFPC_X64MM -dFPCMM_SERVER -dFPCMM_BOOSTER
Last fpclibcmm changes tested - all OK.
Will prepare new TFB PR, hope they don't get sick of me..
A small rawfortunes improvemet (avoid record copy) - gives +4000RPS (+80 composite points)
Now I expect mormot in fortunes to be #10 (just above asp.net core)
Nice catch!
I still worry about valgrind memcheck (`valgrind -v` mode) - it produce too many warnings about access to uninitialized memor/conditional jump or move depends on uninitialised value.
The source of most of them is in TLecuyer and AESNI.
We discuss this problem some years ago - it's also exists in mORMot1.
Maybe this is not a problem from algorithm POW, but this is problem for valgrind DRD tool (I tries to use it to found thread safety problems, but can't do this because of too many memcheck errors). Hope we slowly fix such warnings. At last I will tries..
I can confirm - now it's stable. At last in scenarios where previous implementation crashes! Thank you very mach!
About speed - on server hardware speed is the same (bottleneck is elsewhere).
We were a little late - our TFB MR was merged 2 hours ago (and results will be only in the next round). I will wait to do a new PR, may be we make any more improvements in the next 4 days...
For comparison - results for 28 cores server what shows performance increasing. First 3 because of MM, raw* - mostly because of new PG pipelining. For other endpoints, what almost not allocate results near the same
x64mm libc
/fortunes 181 000 361 000
/rawfortunes 367 000 424 000
/queries?queries=20 33 000 35 000
-- raw perf increased because of new PG pipeline impl
/rawqueries?queries=20 6 000 50 000
/rawupdates?queries=20 3 000 26 000if the server does not crash I expect mORMom can be in the top 10
Made a TFB PR#7879 with glibc MM and improved PG pipelining mode for raw* tests
Even with randomly occurs error with glibc and plaintext in pipelinig mode this version should work - error does not occurs in case /plaintext tests in pipelining mode are executed after warm up (plaintext w/o pipelining) - as it done in TFB benchmark
Also catched the same pattern in libs/mORMot/src/net/mormot.net.sock.posix.inc:1165 (SelLength for dyn array what points to nil).
Crash is not stable - sometimes app crash, sometimes - not. Sometimes after 200k requests, sometimes after 400k
BTW - I uses only libc (SynFPCCMemAligned) with mORMot1 for many years without any problems. So I expect there is some error in mORMot2, not in FPC
Got several different stacks from different coredump.
#1 0x00007f175d642db5 in abort () from /lib64/libc.so.6
#2 0x00007f175d69b4e7 in __libc_message () from /lib64/libc.so.6
#3 0x00007f175d6a25ec in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f175d6a439c in _int_free () from /lib64/libc.so.6
#5 0x000000000042bb93 in CMEM_$$_CFREEMEM$POINTER$$QWORD ()
#6 0x00007f175786f9f0 in ?? ()
#7 0x00000000004a3f57 in _DYNARRAY_DECR_REF_FREE (P=0x2, INFO=0x7f175786f540) at libs/mORMot/src/core/mormot.core.rtti.fpc.inc:845
#8 0x0000000001c13468 in ?? ()
#9 0x0000000000415c23 in fpc_dynarray_setlength ()
#10 0x00000000007cf348 in WAITFORMODIFIED (this=0x7f1730001b48, RESULTS=..., TIMEOUTMS=1566933887) at libs/mORMot/src/net/mormot.net.sock.posix.inc:1156
#11 ......another one
#0 0x00007f50e42e237f in raise () from /lib64/libc.so.6
#1 0x00007f50e42ccdb5 in abort () from /lib64/libc.so.6
#2 0x00007f50e43254e7 in __libc_message () from /lib64/libc.so.6
#3 0x00007f50e432c5ec in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f50e432fa55 in _int_malloc () from /lib64/libc.so.6
#5 0x00007f50e4330c72 in malloc () from /lib64/libc.so.6
#6 0x000000000042bb6d in CMEM_$$_CGETMEM$QWORD$$POINTER ()
#7 0x00007f4f5b5eb9b0 in ?? ()
#8 0x000000000041cf6d in SYSTEM_$$_GETMEM$POINTER$QWORD ()
#9 0x0000000001c1ad48 in ?? ()
#10 0x0000000000415bc6 in fpc_dynarray_setlength ()
#11 0x00000000007cf413 in WAITFORMODIFIED (this=0x680, RESULTS=..., TIMEOUTMS=-466738305) at libs/mORMot/src/net/mormot.net.sock.posix.inc:1170
#12 0x00000000000000ce in ?? ()
#13 0x00007ffc49c50f20 in ?? ()
#14 0x00007ffc49c50e9f in ?? ()
#15 0x00007ffc49c50e9e in ?? ()
#16 0x0000000001ca44b8 in ?? ()
#17 0x00000000007d2d10 in POLLFORPENDINGEVENTS (this=0x1ca4488, TIMEOUTMS=1100) at libs/mORMot/src/net/mormot.net.sock.pas:3002
#18 0x000000000067bfd3 in EXECUTE (this=0x1ca5158) at libs/mORMot/src/net/mormot.net.async.pas:1795
#19 0x000000000046da0f in CLASSES_$$_THREADFUNC$POINTER$$INT64 ()
#20.......No stack track yet. Reprodoced only on server hardware with core dump. Will the core dupmp helps?
P.S.
Sorry, my bad. After googling a little now I know how to get stack trace from core dump. Will get it in the morning