You are not logged in.
With NumTinyBlockArenasPO2 = 7 instead of 6 result is 327К
CPU load in user space is ~10% higher than when using libc in both cases
Flags: BOOSTER assumulthrd smallpools perthrd erms
Small: blocks=3K size=309KB (part of Medium arena)
Medium: 60MB/60MB sleep=15K
Large: 0B/640KB sleep=0
Total Sleep: count=15K
Small Getmem Sleep: count=4
288=4
Small Blocks since beginning: 239M/29GB (as small=42/46 tiny=1K/2032)
48=91M 112=38M 80=27M 128=18M 32=14M 96=9M 64=9M 144=4M
160=4M 256=4M 416=3M 880=3M 1264=3M 272=2M 1376=485K 960=475K
Small Blocks current: 3K/309KB
48=2K 64=427 352=200 32=87 128=79 112=73 80=48 96=21
192=14 416=8 576=7 880=7 288=6 160=5 736=5 624=4
Offline
Memory usage statistic
//libc
Maximum resident set size (kbytes): 28896
Minor (reclaiming a frame) page faults: 12867
Voluntary context switches: 5888357
Involuntary context switches: 5049
//x64mm (NumTinyBlockArenasPO2 = 7)
Maximum resident set size (kbytes): 124380
Minor (reclaiming a frame) page faults: 44196
Voluntary context switches: 5220211
Involuntary context switches: 8087
Offline
Great!
Please try in FPCMM_BOOSTER mode with https://github.com/synopse/mORMot2/commit/412fd883
It now has 128 arenas, and a bigger number of pools to fed from.
But of course, as you detected, it consumes more RAM to initialize its internal pools.
Some memory is lost in the process, if the memory does not remain allocated, but has very quick getmem/freemem (as in this server benchmark).
Offline
327K RPS for /fortunes. Memory consumption is higher
Flags: BOOSTER assumulthrd smallpools perthrd erms
Small: 3K/309KB including tiny<=256B arenas=128 pools=95
Medium: 126MB/126MB sleep=2K
Large: 0B/640KB sleep=0
Total Sleep: count=2K
Small Getmem Sleep: count=1
288=1
Small Blocks since beginning: 244M/29GB (as small=42/46 tiny=1K/2032)
48=93M 112=39M 80=28M 128=18M 32=14M 96=9M 64=9M 160=4M
144=4M 256=4M 416=3M 880=3M 1264=3M 272=2M 1376=509K 960=488K
Small Blocks current: 3K/309KB
48=2K 64=426 352=200 32=87 128=80 112=73 80=48 96=21
192=14 416=8 576=7 880=7 288=6 736=5 672=4 160=4
Maximum resident set size (kbytes): 271852
Minor (reclaiming a frame) page faults: 77196
Voluntary context switches: 5309185
Involuntary context switches: 7768
Offline
Using -O4 optimization level (never use it before because of "beware" notes) slightly increases performance (+44k for json for example) and pass all tests.
Also tries Whole Program Optimization - it's decrease executable size from 5Mb to 3Mb but without visible performance changes (compared to -O4)
@ab - how do you think - can we use -O4 for TFB (I'm afraid of accidental falls)?
Offline
Makes sense: only more memory consummed, with not less collision nor sleep.
So I will revert the previous commit to keep the memory lower - already more than glibc.
https://github.com/synopse/mORMot2/commit/19bcf72c
And for the TFB benchmarks, we would rather use the glibc MM.
And I never tested -O4 and I doubt there is any benefit of using it.
Offline