You are not logged in.
During improving framework speed for TechEmpower benchmark we found what on modern hardware libc memory manager faster (sometimes x3 faster) compared to mORMot2 x64mm, but where is some errors in mORMot2 what cause AV while using glibc MM
This thread is for finding and solving such errors.
Default FPC CMem unit is a bit old in its implementation (we could call the libc directly with no prefix - as SynFPCCMemAligned does), but even in this case where is errors.
My favorite valgrind tool can help us to solve memory errors - see https://valgrind.org/docs/manual/quick-start.html
valgrind --leak-check=yes --track-origins=yes ./mormot2tests
My firs attempts shows many "Conditional jump or move depends on uninitialised value(s)" errors. Does mORMot expect the allocated memory to be filled 0?
Last edited by mpv (2023-01-27 10:57:33)
Offline
Also I found what cmem uses less memory (at last in some cases) - see results for TFB /fortunes for mormot server with 168 threads
x64mm
Maximum resident set size (kbytes): 38952
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 21113
Voluntary context switches: 4118110
Involuntary context switches: 2813
cmem
Maximum resident set size (kbytes): 28292
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 19733
Voluntary context switches: 8334379
Involuntary context switches: 863
Last edited by mpv (2023-01-27 10:50:59)
Offline
x64mm is likely to use more resident memory, because the mmap() pages are really loaded, not just reserved.
So it has much lower context switches on page faults (twice less from your numbers). Which is better in practice.
Please try https://github.com/synopse/mORMot2/commit/01fd9895
There is the new mormot.core.fpclibcmm.pas unit.
To enable it, just define FPC_LIBCMM but not FPC_X64MM with {$I mormot.uses.inc} in the dpr.
But libc would abort/SIG_KILL the process on any problem.
And it seems a bit paranoid, because "s := s + s" raise an execption.
This is not a mORMot bug, but sounds like a FPC RTL incompatibility with the libc expectations.
MemSize() used in FPC RTL fpc_AnsiStr_SetLength() seems to be the culprit.
If I use mORMot AppendToRawUtf8() instead of FPC RTL "s := s + s" then all tests do pass - https://github.com/synopse/mORMot2/commit/bdc67a02
But libc MM is slower on my PC with a single or few threads.
GetMem = small string alloc is 700MB/s with fpxlibcmm and 1.2MB/s with fpxc64mm (TRawUtf8Interning test in "direct" mode).
So I wrote in the unit doc to use fpclibcmm only with heavy load process on high number of cores, and when you trust your code to avoid any SIG_KILL.
Does mORMot expect the allocated memory to be filled 0?
From AllocMem() yes, it does. Not from ReallocMem/GetMem.
Offline
Offline
With latest sources TFB plaintext in pipeline mode crash server with message "free(): invalid next size (normal) Aborted (core dumped)" after 200k requests
For mormot test where is too many errors in memcheck mode
Offline
No stack track yet. Reprodoced only on server hardware with core dump. Will the core dupmp helps?
P.S.
Sorry, my bad. After googling a little now I know how to get stack trace from core dump. Will get it in the morning
Last edited by mpv (2023-01-27 23:24:06)
Offline
Got several different stacks from different coredump.
#1 0x00007f175d642db5 in abort () from /lib64/libc.so.6
#2 0x00007f175d69b4e7 in __libc_message () from /lib64/libc.so.6
#3 0x00007f175d6a25ec in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f175d6a439c in _int_free () from /lib64/libc.so.6
#5 0x000000000042bb93 in CMEM_$$_CFREEMEM$POINTER$$QWORD ()
#6 0x00007f175786f9f0 in ?? ()
#7 0x00000000004a3f57 in _DYNARRAY_DECR_REF_FREE (P=0x2, INFO=0x7f175786f540) at libs/mORMot/src/core/mormot.core.rtti.fpc.inc:845
#8 0x0000000001c13468 in ?? ()
#9 0x0000000000415c23 in fpc_dynarray_setlength ()
#10 0x00000000007cf348 in WAITFORMODIFIED (this=0x7f1730001b48, RESULTS=..., TIMEOUTMS=1566933887) at libs/mORMot/src/net/mormot.net.sock.posix.inc:1156
#11 ......
another one
#0 0x00007f50e42e237f in raise () from /lib64/libc.so.6
#1 0x00007f50e42ccdb5 in abort () from /lib64/libc.so.6
#2 0x00007f50e43254e7 in __libc_message () from /lib64/libc.so.6
#3 0x00007f50e432c5ec in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f50e432fa55 in _int_malloc () from /lib64/libc.so.6
#5 0x00007f50e4330c72 in malloc () from /lib64/libc.so.6
#6 0x000000000042bb6d in CMEM_$$_CGETMEM$QWORD$$POINTER ()
#7 0x00007f4f5b5eb9b0 in ?? ()
#8 0x000000000041cf6d in SYSTEM_$$_GETMEM$POINTER$QWORD ()
#9 0x0000000001c1ad48 in ?? ()
#10 0x0000000000415bc6 in fpc_dynarray_setlength ()
#11 0x00000000007cf413 in WAITFORMODIFIED (this=0x680, RESULTS=..., TIMEOUTMS=-466738305) at libs/mORMot/src/net/mormot.net.sock.posix.inc:1170
#12 0x00000000000000ce in ?? ()
#13 0x00007ffc49c50f20 in ?? ()
#14 0x00007ffc49c50e9f in ?? ()
#15 0x00007ffc49c50e9e in ?? ()
#16 0x0000000001ca44b8 in ?? ()
#17 0x00000000007d2d10 in POLLFORPENDINGEVENTS (this=0x1ca4488, TIMEOUTMS=1100) at libs/mORMot/src/net/mormot.net.sock.pas:3002
#18 0x000000000067bfd3 in EXECUTE (this=0x1ca5158) at libs/mORMot/src/net/mormot.net.async.pas:1795
#19 0x000000000046da0f in CLASSES_$$_THREADFUNC$POINTER$$INT64 ()
#20.......
Last edited by mpv (2023-01-28 11:24:27)
Offline
Also catched the same pattern in libs/mORMot/src/net/mormot.net.sock.posix.inc:1165 (SelLength for dyn array what points to nil).
Crash is not stable - sometimes app crash, sometimes - not. Sometimes after 200k requests, sometimes after 400k
BTW - I uses only libc (SynFPCCMemAligned) with mORMot1 for many years without any problems. So I expect there is some error in mORMot2, not in FPC
Last edited by mpv (2023-01-28 11:31:10)
Offline
I think I have found the bug - at least one bug, directly related to WaitForModified() as reported by your stack traces.
TPollSocketEpoll was not thread-safe, whereas mormot.net.async assumed thread-safety....
In fact, it was due to an internal dynamic array for epoll_wait() results, which was not thread-safe.
Using a local stack-allocated array of 256 items is perfectly thread-safe and also faster (on my PC).
Please try https://github.com/synopse/mORMot2/commit/8d333f53
Offline
I can confirm - now it's stable. At last in scenarios where previous implementation crashes! Thank you very mach!
About speed - on server hardware speed is the same (bottleneck is elsewhere).
We were a little late - our TFB MR was merged 2 hours ago (and results will be only in the next round). I will wait to do a new PR, may be we make any more improvements in the next 4 days...
Offline
I spent a lot of time trying to figure out why glibc mcheck() exploses during the mormot regression tests, once any network server starts.
I looked at dump cores, and also at gdb traces.... with no clue about what was wrong.
And then I found out that mcheck() is NOT THREAD SAFE!
MT-Unsafe race:mcheck const:malloc_hooks
https://www.gnu.org/software/libc/manua … cking.html
In the meantime, I was able to identify a small issue in our patched RTL on x86_64, about the "s := s + s" pattern.
https://github.com/synopse/mORMot2/comm … 0c900cba4a
So since all single thread tests do pass with mcheck(), and FPC heaptrc don't find anything wrong either, we could be confident enough.
Offline
Nice catch!
I still worry about valgrind memcheck (`valgrind -v` mode) - it produce too many warnings about access to uninitialized memor/conditional jump or move depends on uninitialised value.
The source of most of them is in TLecuyer and AESNI.
We discuss this problem some years ago - it's also exists in mORMot1.
Maybe this is not a problem from algorithm POW, but this is problem for valgrind DRD tool (I tries to use it to found thread safety problems, but can't do this because of too many memcheck errors). Hope we slowly fix such warnings. At last I will tries..
Last edited by mpv (2023-01-31 11:29:09)
Offline
I just rewrote the mormot.core.fpclibcmm unit.
https://github.com/synopse/mORMot2/comm … bc2f487a0a
The prefix trick was not consistent and cmem fails to run mormot2tests on Linux x86_64.
This unit should work better on Linux or other POSIX systems - even without the msize() API (like the BSDs).
Offline