#151 Re: mORMot 2 » High-Performance Frameworks » 2023-03-01 16:14:07

mpv

Digging deeper, I found what with new implementation we got better result with lower total threads. This is very good - I expect db will be faster. Moreover - 7th*14server is better when 14th*7server.
I will investigate more.
BTW - TFB guys plane to begin setup a NEW servers with Ubuntu 22.04 this week (see https://github.com/TechEmpower/Framewor … 449036999), so we have some time...

#152 Re: mORMot 2 » High-Performance Frameworks » 2023-03-01 13:31:45

mpv

Smoke test on server shows +4% (+50k) RPS for /json for new HASEVENTFD. I'll check in more detail later.

#153 Re: mORMot 2 » High-Performance Frameworks » 2023-02-28 18:24:49

mpv

Can you, pls, commit blocking mode in separate brunch - I want to play with it - seams what it very depends n workers count

#154 Re: mORMot 2 » Uploading Files to TWebSocketAsyncServer » 2023-02-28 15:35:15

mpv

In mORMot1 I upload such huge files by manually split it on chunks (in my case - in browser) and send chunks one-by-one to the server endpoint with chunk number in URL  htp://.../fileUpload?chunk=x&totalChunks=y
On the server side endpoint just append HTTP request body what contains chunk to the file.

The same can be used with mORMot2

#155 Re: mORMot 2 » High-Performance Frameworks » 2023-02-28 13:40:28

mpv

I tried it yesterday (uncomment the line #1981) - on server HW numbers is slight lower compared to Events based algo.
We can reduce syscalls by using blocking IO for eventFD. And remove fOwner.fThreadPollingEventFD.WaitFor(5000) at all.

Blocked read() call should be terminated by signal (and return -1), when application (deamon) stops.
I tried this approach but threads locks somewhere.....

#156 Re: mORMot 2 » High-Performance Frameworks » 2023-02-27 15:40:26

mpv

@ttomas - thanks for investigation. At last now we know it didn't help. BTW I investigate jemalloc with the same result. So for now we can consider glibc mm is good enough

@dcoun - this round (started 2023-02-23) should be the same as previous, because our last MR with minor improvements is not merged yet.

@ab - I investigating a pipe (simple program with one writer, what write pointer into pipe and many readers concurrently read it) - performance depends very much on kernel version. For old kernel (4.18 as on my server) it's terrible. On never  (5.18 as on my desktop)  it`s faster a lot (x10 times). Still planning to try eventfd + ring buffer. Will post a gists when finish.

#157 Re: mORMot 2 » Using LibMysql directly » 2023-02-23 11:45:13

mpv

For a long time I use DBeaver as a DB tool (I works with many different RDBMS), but last 3 years I use DataGrip (as a plugin for WebStorm where I spend a lot of time with JS). As all JetBrains products it's very very good and 100% worth the money. 3 years ago it cost me 27$, now price is high.
Can't help with mysql because do not use it..

#158 Re: mORMot 2 » High-Performance Frameworks » 2023-02-22 16:12:23

mpv
ab wrote:

I will try to include eventfd this weekend

Thank you very match!

I will made a new PR to TFB based on today`s sources with all minor changes and clean-upped raw.pas (w\o enabling smoothing). This give us a clean picture for comparison of event vs eventfd in future.

P.S.
Ready #7944

#159 Re: mORMot 2 » High-Performance Frameworks » 2023-02-22 11:21:16

mpv

Yes, I plays with ThreadPollingWakeupLoad + hsoThreadSmoothing (4, 8, 16 32). A best s for for ThreadPollingWakeupLoad = 8. In best case(if on TFB HW boots will be near the same) we got +150 points,  but I still not sure Smoothing is a good solution in terms of "realistc". So my propose is to enable it when 150 points meters, not now...

            w\o     tpw4      tpw8      tpw16    tpw32(m)
json      1338 754  1360 205  1457 470  1406 111  1163 232
rawdb      470 228   471 954   477 038   476 576   474 571
db         460 406   455 805   460 086   456 671   455 382
fortunes   386 097   388 822   392 342   387 425   387 932
rawfort    437 250   435 545   436 378   438 344
plaintext 4238 095  4125 256  4353 065  4249 592  4223 741
rawQuery    45 925              45 809    45 819
queries     34 712              34 658    34 826

I can try to use eventfd on weekend, or, if you plane to try it by yourself (mormot async code is still al little complex for me), please notify me.

#160 Re: mORMot 2 » High-Performance Frameworks » 2023-02-21 20:32:28

mpv
ab wrote:

I don't get why the 20-query decreased so much

all frameworks decrease in the same way after adding Sync() - we not first.
When PG server is local the decrease is not so significant.   

@ab - crazy idea - what if we use named pipes (mkfifo) to wake up a threads in pool instead of futex? Have you considered this option?
I mean TAsyncConnections creates a fifo pipe and write when data is ready, and all workers simply read from pipe in blocking mode? And let's kernel decide what worker wakes up by read successfully..
In this case we twice syscalls count (write + read instead of futex) but avoid loops over fThreads and simplify code a lot. Isn't it?

P.S.
Even simple pipe should work. And where is also a message queue if pipes is slow...

#161 Re: mORMot 2 » Using LibMysql directly » 2023-02-21 16:01:01

mpv

@squirrel - take a look into Postgre, really.  It's faster, better supported by mORMot and  have some KILLER features (mostly because of very extendable architecture), missed in other DB's. I use FTS (ts_vector), trigrams, PostGIS, timescaleDB  with great success...

#162 Re: mORMot 2 » High-Performance Frameworks » 2023-02-21 08:57:39

mpv

Results are ready - we still #16. Impressive plaintext improvements, rawqueries (with PG Sync) worse than I expected, small improvement for json and db (I think because of -O4)

Weights		1.000	1.737	21.745	4.077	68.363	0.163
Composire #	JSON	1-query	20-query Fortunes Updates Plaintext 	Weighted score
38 	mormot 	731,119	308,233	19,074	288,432	3,431	2,423,283 	3,486  2022-10-26 - 64 thread limitation
43 	mormot 	320,078	354,421	19,460	322,786	2,757	2,333,124 	3,243  2022-11-13 - 112 thread (28CPU*4)	
44 	mormot 	317,009	359,874	19,303	324,360	1,443	2,180,582 	3,138  2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 	mormot 	563,506	235,378	19,145	246,719	1,440	2,219,248 	2,854  2022-12-01 - 112 thread (28CPU*4) CPU affinity	
51 	mormot 	394,333	285,352	18,688	205,305	1,345	2,216,469 	2,586  2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 	mormot 	859,539	376,786	18,542	349,999	1,434	2,611,307 	3,867  2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 	mormot 	948,354	373,531	18,496	366,488	11,256	2,759,065 	4,712  2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 	mormot 	957,252	392,683	49,339	393,643	22,446	2,709,301 	6,293  2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 	mormot 	963,953	394,036	33,366	393,209	18,353	6,973,762 	6,368  2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required,  -O4 optimization  

@ab - you are right. Replacing TOSLightLock into TLightLock dose not change futex calls count. So, this is thread awaking. Will play with WakeupLoad later...   

@ttomas - current round result for mormot /plaintext is ~7M - and this is 10Gb network limitation on TFB hw. I think we should focus on /json. If we speed up HTTP for json,  this automatically speed up db related  tests.

#163 Re: mORMot 2 » High-Performance Frameworks » 2023-02-20 19:05:23

mpv
ab wrote:

Is TPollSockets.GetOnePending the bottleneck?
Should we try to switch the fPending: TPollSocketResults structure into a truly lock-less algorithm?

I think - yes. It's not visible in profiler, but this is a syscalls stat (cleanupped a little) of 1667410 /json requests for 24 thread server

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- -------------------
 40,34  242,887052         281    863994    163691 futex
 19,11  115,090885          69   1667410           sendto
 16,25   97,843747          58   1667499        15 recvfrom
  9,66   58,190893      843346        69           nanosleep
  0,14    0,826164          61     13415           epoll_wait <--- I play with event count - 256 is good enough
  0,01    0,032096          63       504           epoll_ctl  

I think futex`es - is because of Lock; RTLCriticalSection and other friends in GetOnePending

Also tried @ttomas idea with different containers/threads settings for different tests.
After many attempts, I have achieved a small (100K) increase in /json, but the number of threads/instances is just magical (for 28 processors, 8 instances with 7 threads is the best). Also, separate containers do nothing for database-related tests. I suggest leaving everything as it is

#164 Re: mORMot 2 » Prepare expected 1 parameters in request, found 0 » 2023-02-20 11:38:07

mpv

Problem is what db layer expect `?` is a parameter placeholder.
For ExecuteInlined use inlined parameters syntax  :(paramValue):

But to to prevent SQL injection and speed up excution better to execute query with parameters:

stmt := conn.NewStatementPrepared('insert into NEW_TABLE (ID, DESCRIPT) values (?, ?)', false, true)
stmt.Bind(1, 4);
stmt.Bind(2, 'I''m confused?');
stmt.ExecutePrepared;

#165 Re: mORMot 2 » High-Performance Frameworks » 2023-02-19 17:54:56

mpv

My bad about https://synopse.info/forum/viewtopic.ph … 095#p39095 - in X X 1 mode raw enables hsoThreadSmooting - and this is why we got so strange thread using distribution

#166 Re: mORMot 2 » High-Performance Frameworks » 2023-02-19 17:30:03

mpv
ttomas wrote:

Looking at benchmark_config.json from ntex (No1 in compose score from prev round), also asp.net.core, instead of using one docker/monolith app, I propose to create 3 docker/app, 1-json/plaintext, 2-orm, 3-raw. Compose score will aggregate best results from all 3 app. Also for json/plaintext you can use different No of threads then db tests, what is scale better.

Nice idea, I will investigate it!
Different thread counts for non-db test seams make sense. I'll try to determinate best and may be enable CPU affinity for them.
As I understand from python sources, DB container is restarted for each test(once for default and once for postgres-raw), and current test order is  db->json->queries->updates->fortunes->plaintext is OK (it's important what updates is after db/queries). So I do not see a reason to create separate containers for db and rawdb (but will check dose it make sense)

About "tuning the dockers" - we can't do this from inside a container. Docker daemon is configured on the host.

P.S.
My understanding of docker (simplified):
Docker is a slim wrapper around Linux namespaces and cgroups (+AuFS as a read only FS with layers). So everything is executed on host, but executable links to the libraries from docker image. No magic at all smile
BTW this is a reason we can't use io_uring - as far as i understand TFB host machine is based on Ubuntu 18 and io_uring not supported there. TFB have a planes to migrate it to 22.04 in next round.

#167 Re: mORMot 2 » High-Performance Frameworks » 2023-02-18 16:49:00

mpv

Several processes was my first attempt - one process with 6 listeners and 28 thread for each is better when 6 processes x28 thread (a think because of memory managment)

#168 Re: mORMot 2 » High-Performance Frameworks » 2023-02-18 12:33:58

mpv

Switching to io_uring is too radical a change, IMHO. And as I see, the top 10 frameworks use epool...

For my opinion our current problem is what with the same number of threads increasing listening sockets count improves json performance
I catch a case, where performance difference is dramatically on both server and on my PC (for other therads/sockets count numbers more sockets also always win, but the difference is not so big)

Server (28 cores)

wrk -d 15 -c 64 --timeout 8 -t 28 "http://localhost:8080/json" 

./raw 56 28 1 (1x56 thread) -  143K RPS
./raw 28 28 2 (2x28 thread) -  770K RPS

My PC (12 cores)

wrk -d 15 -c 64 --timeout 8 -t 12 "http://localhost:8080/json" 

./raw 24 12 1 - 300K
./raw 12 12 2 - 500K

perf (for server) shows what:
in case of one socket most of the load is gone into only 2 thread R18080 R28080 - see flamergath_json_56_1
in case of two sockets load if distributed between many threads - see flamergath_json_28_2

I do not understand the reason of such behavior

P.S.
I also tried
- switching the PG protocol to binary mode - performance is the same 
- sets HttpQueueLength to 0 - no changes in performance except small amount of sockets errors

#169 Re: mORMot 2 » High-Performance Frameworks » 2023-02-17 20:43:15

mpv

Today I tries to found bottlenecks using perf, but  w\o success..

Your changes about HTTP processing (nice idea!) on server gives:
  +20k RPS for json
  +60k RPS for pipelined plaintext
   +3k for /db and /rawdb /rawfortunes

Cumulative /fortunes effect is +25K

#170 Re: mORMot 2 » High-Performance Frameworks » 2023-02-17 11:25:42

mpv

Yes, unexpectedly..
I post an announcement for our work on freepascal forum. May be community generate more ideas for optimizations....

#171 Re: mORMot 2 » ab please help - BatchSend with updates takes too long for external db » 2023-02-16 18:21:18

mpv

One of the goals of mormot.db.sql.postgres development was to participate in TechEmpower benchmark. as of today we are in TOP15 there. So yes - it's overhead is minimal.

#172 Re: mORMot 2 » High-Performance Frameworks » 2023-02-16 18:16:35

mpv

All tests passed. Changes a tiny, сlose to a statistical error. I will take them for the next PR...

#173 Re: mORMot 2 » High-Performance Frameworks » 2023-02-16 10:32:22

mpv

TFB PR #7926 is ready
-   /queries and /updates for raw test case: Postgres SQL pipelining uses Bind->Exec->Sync as required by General Rule 7
-   /plaintext: improved HTTP pipelining on mORMot level - added response buffering to minimize send syscall`s
-   general optimization: use aggressive compiler optimization level (-O4)

I expect that with these changes we will be #12 ( +600 point for /plaintext and - 70 point for /rawqueries + some points because of -O4 for all endpoints)

So, the next goal is to be in TOP 10 smile
I have some ideas but still investigating....

#174 Re: mORMot 2 » mORMot 2 Release Candidate » 2023-02-15 22:12:05

mpv

@ab - one request for future mORMot2 release - please, compress static in mormot2 release using tar.gz. 7z is not available in default distribution (on both Linux and Windows), and installing p7zip package, for example inside docker build as we do in TFB bench, just to decompress  mormot2static.7z  takes some time, and worst what it require internet connection - I sometimes have problem with Internet after missile attacks sad

#175 Re: mORMot 2 » High-Performance Frameworks » 2023-02-15 21:51:08

mpv

PG pipelining implemented using Sync() as required by TFB rules. raw* performance decreases a little (from 50K to 46K for /rawqueries) 
@ab, please merge https://github.com/synopse/mORMot2/pull/144 - I add Conn.CheckPipelineSync method and need it to prepare TFB PR

#176 Re: mORMot 2 » High-Performance Frameworks » 2023-02-15 20:50:39

mpv

All TFB tests are passed
/plaintext results:
my PC: 1 271 496 -> 3 764 547
server:  2 621 195 -> 4 343 425 (near the same results should be on TFB hardware)

Perfect!

I need some time to solve PG Sync() and will prepare a PR

#177 Re: mORMot 2 » High-Performance Frameworks » 2023-02-15 15:13:59

mpv

Will test it tonight. And try to add sync for pipelining..

#178 Re: mORMot 2 » High-Performance Frameworks » 2023-02-15 10:38:02

mpv

Unfortunately, as noted here  our PG pipeline optimization breaks General test requirements #7 rule:
   "If using PostgreSQL's extended query protocol, each query must be separated by a Sync message"

I do not understand why this requirement exists for synchronous case, but we have what we have. So I will rewrite PG pipelining again sad

About valgrid - I can't see a bottleneck there (mostly because it works slow), but now I trues to use perf + flamegraph - this techniques show more details  because of HI-frequency sampling.

What I fund for a moment is a way to optimize /plaintext in HTTP pipelining mode. Flamegraph for pipelined plaintext shows what most of the time we are in send syscall - see flamegrapj for ./raw 4 4 1 (clickable)
This is because in current HTTP pipelining implementation we do less recv syscalls (read several GET request in one syscall) but we can also buffer an output and do less send`s!

This can be done if HTTP state machine detect what for GET request with Connection: keep-alive and HTTP 1.1  where is some additional GET after `\r\n`
and when send`s buffered response either if send buffer is overflow (for example nodeJS buffers such responses in 64k buffer) or pipeline ends (no additional bytes read)

Test case for 2 pipelined HTTP query:

(echo -en "GET /plaintext HTTP/1.1\nHost: foo.com\nConnection: keep-alive\n\nGET /plaintext HTTP/1.1\nHost: foo.com\n\n"; sleep 10) | telnet localhost 8080

currently produce 2 sendto(7, "HTTP/1.1... syscall, but can produce one.

Since HTTP pipelining should not be used in production for many reasons, this optimization can be done under hsoHttpPipelining flag. @ab - can you implement this idea?

P.S. - to become clickable flamegraph should be downloaded first and when opened in browser - click do not work when svg is opened in Google Drive preview

#179 Re: mORMot 2 » High-Performance Frameworks » 2023-02-14 15:35:07

mpv

mROMot results is ready. In the end of round we will be #16 in composite score - mostly because of improved PG pipelining, what affect queries and updates on raw mode.

Weights		1.000	1.737	21.745	4.077	68.363	0.163
Composire #	JSON	1-query	20-query Fortunes Updates Plaintext 	Weighted score
38 	mormot 	731,119	308,233	19,074	288,432	3,431	2,423,283 	3,486  2022-10-26 - 64 thread limitation
43 	mormot 	320,078	354,421	19,460	322,786	2,757	2,333,124 	3,243  2022-11-13 - 112 thread (28CPU*4)	
44 	mormot 	317,009	359,874	19,303	324,360	1,443	2,180,582 	3,138  2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 	mormot 	563,506	235,378	19,145	246,719	1,440	2,219,248 	2,854  2022-12-01 - 112 thread (28CPU*4) CPU affinity	
51 	mormot 	394,333	285,352	18,688	205,305	1,345	2,216,469 	2,586  2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 	mormot 	859,539	376,786	18,542	349,999	1,434	2,611,307 	3,867  2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 	mormot 	948,354	373,531	18,496	366,488	11,256	2,759,065 	4,712  2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 	mormot 	957,252	392,683	49,339	393,643	22,446	2,709,301 	6,293  2023-02-14 - 168 threads, cmem, inproved PG pipelining 

Also we will be:
- #1 for /queries in raw mode with PG pipelining. I hope everything was done correctly there
- #3 for /fortunes in ORM(orm=full) mode. Result improved by +30%, thanks to cmem. We are behind lithium(C++) where template engine is not used and xitca-web (rust)
- #2 for /db in ORM(orm=full) mode. A little (8000RPS) behind xitca-web

A possible way to optimize is to find a bottleneck in the acync HTTP server, which should improve json and plaintext

#180 Re: mORMot 1 » Real high-concurrency & replication is coming to SQLite! » 2023-02-08 15:44:44

mpv

mORMot uses global lock for SQLITE3 updates, so problem not happens where.

I use my own TubSQLite3ThreadSafeConnection = class(TSQLDBConnectionThreadSafe) without global lock (it actually exists inside sqlite) and sets

TSQLDataBase.WalMode := true;
TSQLDataBase.BusyTimeout := 10000; // 10 seconds

In case of sqlite3_busy  - this error is passed to user and on UI message "try again later" is displayed.

#181 Re: mORMot 1 » Real high-concurrency & replication is coming to SQLite! » 2023-02-07 18:38:57

mpv

For sure, it's huge improvement. Currently I can't use sqlite3 on real productions where long transactions are commonplace (because of sqlite3_busy error), but hctree should solve this problem, and (at least where DB size < 500Gb)  sqlite can be usable.

#182 Re: mORMot 2 » High-Performance Frameworks » 2023-02-05 13:40:30

mpv

Using -O4 optimization level (never use it before because of "beware" notes) slightly increases performance  (+44k for json for example) and pass all tests.
Also tries Whole Program Optimization - it's decrease executable size from 5Mb to 3Mb but without visible performance changes (compared to -O4)

@ab - how do you think - can we use -O4 for TFB (I'm afraid of accidental falls)?

#183 Re: mORMot 2 » High-Performance Frameworks » 2023-02-05 12:02:37

mpv

327K RPS for /fortunes. Memory consumption is higher

Flags: BOOSTER  assumulthrd smallpools perthrd erms                                
Small:  3K/309KB  including tiny<=256B arenas=128 pools=95                         
Medium: 126MB/126MB  sleep=2K                                                      
Large:  0B/640KB  sleep=0                                                          
Total Sleep: count=2K                                                              
Small Getmem Sleep: count=1                                                        
288=1                                                                              
Small Blocks since beginning: 244M/29GB (as small=42/46 tiny=1K/2032)              
48=93M  112=39M  80=28M  128=18M  32=14M  96=9M  64=9M  160=4M                     
144=4M  256=4M  416=3M  880=3M  1264=3M  272=2M  1376=509K  960=488K               
Small Blocks current: 3K/309KB                                                     
48=2K  64=426  352=200  32=87  128=80  112=73  80=48  96=21                        
192=14  416=8  576=7  880=7  288=6  736=5  672=4  160=4

Maximum resident set size (kbytes): 271852                                         
Minor (reclaiming a frame) page faults: 77196                                      
Voluntary context switches: 5309185                                                
Involuntary context switches: 7768

#184 Re: mORMot 2 » High-Performance Frameworks » 2023-02-03 21:05:44

mpv

Memory usage statistic

//libc
Maximum resident set size (kbytes): 28896
Minor (reclaiming a frame) page faults: 12867
Voluntary context switches: 5888357
Involuntary context switches: 5049

//x64mm (NumTinyBlockArenasPO2 = 7)
Maximum resident set size (kbytes): 124380              
Minor (reclaiming a frame) page faults: 44196          
Voluntary context switches: 5220211                    
Involuntary context switches: 8087

#185 Re: mORMot 2 » High-Performance Frameworks » 2023-02-03 20:59:28

mpv

With NumTinyBlockArenasPO2 = 7 instead of 6 result is 327К
CPU load in user space is ~10% higher than when using libc in both cases

Flags: BOOSTER  assumulthrd smallpools perthrd erms                            
Small:  blocks=3K size=309KB (part of Medium arena)                            
Medium: 60MB/60MB  sleep=15K                                                   
Large:  0B/640KB  sleep=0                                                      
Total Sleep: count=15K                                                         
Small Getmem Sleep: count=4                                                    
288=4                                                                          
Small Blocks since beginning: 239M/29GB (as small=42/46 tiny=1K/2032)          
48=91M  112=38M  80=27M  128=18M  32=14M  96=9M  64=9M  144=4M                 
160=4M  256=4M  416=3M  880=3M  1264=3M  272=2M  1376=485K  960=475K           
Small Blocks current: 3K/309KB                                                 
48=2K  64=427  352=200  32=87  128=79  112=73  80=48  96=21                    
192=14  416=8  576=7  880=7  288=6  160=5  736=5  624=4

#186 Re: mORMot 2 » High-Performance Frameworks » 2023-02-03 20:50:44

mpv

With arenas, bind to threadID fortunes result is 313K RPS - very close to 355K with libc. Congratulations!

Flags: BOOSTER  assumulthrd smallpools perthrd erms                              
Small:  blocks=3K size=309KB (part of Medium arena)                              
Medium: 51MB/51MB  sleep=10K                                                     
Large:  0B/640KB  sleep=0                                                        
Total Sleep: count=10K                                                           
Small Getmem Sleep: count=16                                                     
288=14 80=2                                                                      
Small Blocks since beginning: 234M/28GB (as small=42/46 tiny=746/1008)           
48=89M  112=37M  80=27M  128=18M  32=14M  96=9M  64=9M  144=4M                   
160=4M  256=4M  416=3M  880=3M  1264=3M  272=2M  960=465K  1376=464K             
Small Blocks current: 3K/309KB                                                   
48=2K  64=426  352=200  32=87  128=80  112=73  80=48  96=21                      
192=14  416=8  576=7  880=7  288=6  736=5  160=4  672=4

P.S.
sleeps count is increased, but overall speed - also

#187 Re: mORMot 2 » High-Performance Frameworks » 2023-02-02 21:13:12

mpv

With FPCMM_BOOST result is 226К

Flags: BOOST  assumulthrd smallpool erms debug                                             
Small:  blocks=3K size=309KB (part of Medium arena)                                        
 Medium: 13MB/13MB    peak=13MB current=11 alloc=11 free=0 sleep=229                       
 Large:  0B/640KB    peak=640KB current=0 alloc=2 free=2 sleep=0                           
 Total Sleep: count=229                                                                    
 Small Blocks since beginning: 157M/19GB (as small=43/46 tiny=112/112)                     
  48=56M  112=25M  80=18M  128=12M  32=9M  96=6M  64=6M  160=3M                            
  144=3M  256=3M  880=2M  416=2M  1264=2M  272=1M  448=277K  960=273K                      
 Small Blocks current: 3K/309KB                                                            
  48=2K  64=426  352=200  32=87  128=80  112=73  80=48  96=21                              
  192=14  416=8  576=7  880=7  288=6  736=5  672=4  160=4    

#189 Re: mORMot 2 » High-Performance Frameworks » 2023-02-02 20:48:46

mpv

Just tried with commit/60024584 and FPCMM_BOOSTER - results is better - 243K RPS on fortunes (instead of 181)

Flags: BOOSTER  assumulthrd smallpools erms                                              
Small:  blocks=3K size=309KB (part of Medium arena)                                      
Medium: 43MB/43MB  sleep=137                                                             
Large:  0B/640KB  sleep=0                                                                
Total Sleep: count=137                                                                   
Small Blocks since beginning: 180M/22GB (as small=41/46 tiny=466/496)                    
  48=68M  112=28M  80=20M  128=14M  32=10M  96=7M  64=7M  160=3M                         
  144=3M  256=3M  880=3M  416=3M  1264=3M  272=2M  960=310K  448=308K                    
 Small Blocks current: 3K/309KB                                                          
  48=2K  64=426  352=200  32=87  128=80  112=73  80=48  96=21                            
  192=14  416=8  576=7  880=7  288=6  736=5  672=4  624=4

#190 Re: mORMot 2 » High-Performance Frameworks » 2023-02-01 19:41:33

mpv

By filtering valgrind profiling data using `mem` I found what the only valuable difference is  self timing of _getmem and _freemem
x64mm: 0.69 0.46
glibc    : 0.26  0.23

#191 Re: mORMot 2 » High-Performance Frameworks » 2023-02-01 19:12:43

mpv

Unfortunately there is no changes in fortunes an all... New x64mm - 181K RPS (~90% of CPU in userspace, 10% - in kernel), glibc MM  -  360K RPS (50% CPU in user space, 50% in kernel).

I checked syscalls - both MM do near the same amount of mmap and munmap.
And I do not see anything strange in valgrind sad
If you need some addition help - please, tell me...

P.S.
compiled with -dFPC_X64MM -dFPCMM_SERVER -dFPCMM_BOOSTER

#192 Re: mORMot 2 » High-Performance Frameworks » 2023-01-31 20:48:55

mpv

Last fpclibcmm changes tested - all OK.
Will prepare new TFB PR, hope they don't get sick of me..

#193 Re: mORMot 2 » High-Performance Frameworks » 2023-01-31 16:04:30

mpv

A small rawfortunes improvemet (avoid record copy) - gives +4000RPS (+80 composite points)
Now I expect mormot in  fortunes to be #10 (just above asp.net core)

#194 Re: mORMot 2 » adopting mORMot2 to use glibc memory manager (POSIX) » 2023-01-31 11:28:19

mpv

Nice catch!

I still worry about valgrind memcheck (`valgrind -v` mode) - it produce too many warnings about access to uninitialized memor/conditional jump or move depends on uninitialised value.
The source of most of them is in TLecuyer and AESNI.
We discuss this problem some years ago - it's also exists in mORMot1.

Maybe this is not a problem from algorithm POW, but this is problem for valgrind DRD tool (I tries to use it to found thread safety problems, but can't do this because of too many memcheck errors). Hope we slowly fix such warnings. At last I will tries..

#195 Re: mORMot 2 » adopting mORMot2 to use glibc memory manager (POSIX) » 2023-01-30 19:17:30

mpv

I can confirm - now it's stable. At last in scenarios where previous implementation crashes! Thank you very mach!
About speed - on server hardware speed is the same (bottleneck is elsewhere).

We were a little late - our TFB MR was merged 2 hours ago (and results will be only in the next round). I will wait to do a new PR, may be we make any more improvements in the next 4 days...

#196 Re: mORMot 2 » High-Performance Frameworks » 2023-01-28 18:13:49

mpv

For comparison - results for 28 cores server what shows performance increasing. First 3 because of MM, raw* - mostly because of new PG pipelining. For other endpoints, what almost not allocate results near the same

			     x64mm	 libc
/fortunes                   181 000	361 000
/rawfortunes                367 000	424 000
/queries?queries=20          33 000	 35 000
-- raw perf increased because of new PG pipeline impl
/rawqueries?queries=20       6 000	 50 000
/rawupdates?queries=20       3 000	 26 000

if the server does not crash I expect mORMom can be in the top 10

#197 Re: mORMot 2 » High-Performance Frameworks » 2023-01-28 14:29:21

mpv

Made a TFB PR#7879 with glibc MM and improved PG pipelining mode for raw* tests

Even with randomly occurs error with glibc and plaintext in pipelinig mode this version should work - error does not occurs in case /plaintext tests in pipelining mode are executed after warm up (plaintext w/o pipelining) - as it done in TFB benchmark

#198 Re: mORMot 2 » adopting mORMot2 to use glibc memory manager (POSIX) » 2023-01-28 11:27:56

mpv

Also catched the same pattern in libs/mORMot/src/net/mormot.net.sock.posix.inc:1165 (SelLength for dyn array what points to nil).
Crash is not stable - sometimes app crash, sometimes - not. Sometimes after 200k requests, sometimes after 400k

BTW - I uses only libc (SynFPCCMemAligned) with mORMot1 for many years without any problems. So I expect there is some error in mORMot2, not in FPC

#199 Re: mORMot 2 » adopting mORMot2 to use glibc memory manager (POSIX) » 2023-01-28 11:13:47

mpv

Got several different stacks from different coredump.

#1  0x00007f175d642db5 in abort () from /lib64/libc.so.6
#2  0x00007f175d69b4e7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f175d6a25ec in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f175d6a439c in _int_free () from /lib64/libc.so.6
#5  0x000000000042bb93 in CMEM_$$_CFREEMEM$POINTER$$QWORD ()
#6  0x00007f175786f9f0 in ?? ()
#7  0x00000000004a3f57 in _DYNARRAY_DECR_REF_FREE (P=0x2, INFO=0x7f175786f540) at libs/mORMot/src/core/mormot.core.rtti.fpc.inc:845
#8  0x0000000001c13468 in ?? ()
#9  0x0000000000415c23 in fpc_dynarray_setlength ()
#10 0x00000000007cf348 in WAITFORMODIFIED (this=0x7f1730001b48, RESULTS=..., TIMEOUTMS=1566933887) at libs/mORMot/src/net/mormot.net.sock.posix.inc:1156
#11 ......

another one

#0  0x00007f50e42e237f in raise () from /lib64/libc.so.6
#1  0x00007f50e42ccdb5 in abort () from /lib64/libc.so.6
#2  0x00007f50e43254e7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f50e432c5ec in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f50e432fa55 in _int_malloc () from /lib64/libc.so.6
#5  0x00007f50e4330c72 in malloc () from /lib64/libc.so.6
#6  0x000000000042bb6d in CMEM_$$_CGETMEM$QWORD$$POINTER ()
#7  0x00007f4f5b5eb9b0 in ?? ()
#8  0x000000000041cf6d in SYSTEM_$$_GETMEM$POINTER$QWORD ()
#9  0x0000000001c1ad48 in ?? ()
#10 0x0000000000415bc6 in fpc_dynarray_setlength ()
#11 0x00000000007cf413 in WAITFORMODIFIED (this=0x680, RESULTS=..., TIMEOUTMS=-466738305) at libs/mORMot/src/net/mormot.net.sock.posix.inc:1170
#12 0x00000000000000ce in ?? ()
#13 0x00007ffc49c50f20 in ?? ()
#14 0x00007ffc49c50e9f in ?? ()
#15 0x00007ffc49c50e9e in ?? ()
#16 0x0000000001ca44b8 in ?? ()
#17 0x00000000007d2d10 in POLLFORPENDINGEVENTS (this=0x1ca4488, TIMEOUTMS=1100) at libs/mORMot/src/net/mormot.net.sock.pas:3002
#18 0x000000000067bfd3 in EXECUTE (this=0x1ca5158) at libs/mORMot/src/net/mormot.net.async.pas:1795
#19 0x000000000046da0f in CLASSES_$$_THREADFUNC$POINTER$$INT64 ()
#20.......

#200 Re: mORMot 2 » adopting mORMot2 to use glibc memory manager (POSIX) » 2023-01-27 21:28:50

mpv

No stack track yet. Reprodoced only on server hardware with core dump. Will the core dupmp helps?

P.S.
Sorry, my bad. After googling a little now I know how to get stack trace from core dump. Will get it in the morning

Board footer

Powered by FluxBB