You are not logged in.
Yes, I'll do it on this weekend
Nice finding, for shure. Sometimes Pascal is unsafe....
mORMot links to `libc`, but alpine image uses `musl` instead (this is not an old libc - this is another implementation of libc), so yes - this is not a Docker problem, but problem with running on alpine docker image.
See https://wiki.alpinelinux.org/wiki/Runni … c_programs about how to run on alpine if it is absolutely needed.
My opinion - do not use alpine at all - it creates too many problems, the gain of a few MB of disk space is not worth it.
mormot2 runs perfectly in container, we did it in TFB benchmark for a long time. Most likely problem because of alpine image, this image is based on truncated libc. Try with ubuntu or something with fulll libc
No, only connections per thread (operating system thread ID). Technically, this is not limited to HTTP server threads - your application can create its own threads.
Official Round 22 is completed and will be published soon. We are #12 in composite scores. Unfortunately cached-queries is not completed (server crashes with double free). So some memory problem is still exists. We are lucky what cached-queries not counted in composite section...
@ab - please, see https://github.com/TechEmpower/Framewor … ssues/8501 - maybe you have the inspiration to write some blog article?
Thanks! I back-port it to mORMot1 - see https://github.com/synopse/mORMot/pull/446
While reading log files using Rust, I discover what some of them is not a UTF8 text. This happens in case we truncate a long strings using `TextTruncateAtLength` parameter, and writer break a string inside a surrogate pair (in my case I logs an input JSON what contains a non latin strings).
IMHO good place is to fix this inside a logger here https://github.com/synopse/mORMot2/blob … .pas#L5531 (not inside writer for compatibility). Good solution is to tries step forward a little (for max of 3 char, because string can be actual a binary) and found actual character end.
Or I should fix this on my app level?
We have a similar "reporting" service.
We have a reverse proxy what proxies all "fast" requests such as logging in and editing records to one instance, all "report" requests - to another.
First we scale our "report" instance by increasing threadPoolSize.
When this was not enough - by adding hosts + load balancer on reverse proxy level.
After the database became a bottleneck, we change the user API - some "reports" requests now processed to clients synchronously, but if we detect what calculating of the response MAY take a long time (by using some kind of heuristic) we answer to client with requestID, put request into durable queue (database table with row locking for a while, but will migrate to redis).
We have a pool of workers what pull a queue, process requests using database replicas, and put responses back into queue.
Client use pooling to detect what "report" for their requestID is ready.
Currently our load is ~100 "report" per second. Some reports are generated in 0.1 seconds, some - up to 20 minutes (hundreds of pages)
The simplest is to increase a threadPoolSize to 32 or 64
Let's keep source as is for a while - in any case, they will not be merged until Round 22.
I expect what mormot [async] demonstrate the same updates results as in run from 2023-05-20 and we'll be #7, because all conditions is the same from our side, but we have what we have
At least there will be something to work on in Round 23
Modern Linux distribution came with OpenSSL3x (RHE/OEL 9, Cenos stream etc.), It's even not possible to install OpenSSL1,1 (only using compat-openssl11 package). So in most case we will use 3.x. Hope performance issues will be fixed in 3.0 also.
https://tfb-status.techempower.com/ is up and running, but the current results are very strange - almost all frameworks (including mormot) lost up to 20% of RPS in /json and /plaintext, except two (with no changes in sources). It looks like "All animals are equal, but some are more equal than others" (hope I'm wrong)
For RabbitMQ there is a Habari STOMP Client - in latest releases it based on mORMot: https://forum.lazarus.freepascal.org/in … #msg470786
For Redis I wrote a simple TCrtSocket (mORMot1) based client to use in UnityBase - you can take it here (classes TRedisConnectionsManager & TRedisClient not depends on SpiderMonkey): https://git-pub.intecracy.com/unitybase … gredis.pas
Official TFB Round 22 expected in August 2023. I expect we will be #7 in composite score rating.
BTW in last run we #1 in Cached queries - see https://www.techempower.com/benchmarks/ … ched-query
Headers delimirer is CRLF (#13#10), not LF(#10) as in your example.
It is a very "hot" summer in my country, something burns every day (the occupiers, thank God, have more).
I decide to set [async] test back to `-s 28 -t 8 -p` mode - this gives us a best results for `/update` (~23k), even if we do not fully understand a reason.
See PR 2803
Perhaps I will return to optimization closer to September
Replace your OleDBConnectionProperties with TODBCConnectionProperties (mORMot1) / TSqlDBOdbcConnectionProperties (mORMot2) - see Create method docs for examples
Please, note what OleDB is deprecated, see https://learn.microsoft.com/en-us/sql/r … rver-ver16
We use ODBC on both nix and Windows last year, works OK
The state of async summarized form last 2 runs:
db queries fortunes updates
sync
-s28 -t8 -p 362,787 31,436 353,182 16,750
async
-s28 -t8 -p 381,633 32,725 290,512 23,022
-s56 -t1 -p 398,619 32,892 312,836 20,154
-s1 -t56 -nop 371,371 31,766 330,803 19,995
I made a new PR 8232 with changed [async] to -s28 -t4 -p, the same [async,nopin] and binary array binding. After receiving the results, we will be able to choose the best param's for async
TFB state
Weights 1.000 1.737 21.745 4.077 68.363 0.163
# JSON 1-query 20-q Fortunes Updates Plaintext Scores
38 731,119 308,233 19,074 288,432 3,431 2,423,283 3,486 2022-10-26 - 64 thread limitation
43 320,078 354,421 19,460 322,786 2,757 2,333,124 3,243 2022-11-13 - 112 thread (28CPU*4)
44 317,009 359,874 19,303 324,360 1,443 2,180,582 3,138 2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 563,506 235,378 19,145 246,719 1,440 2,219,248 2,854 2022-12-01 - 112 thread (28CPU*4) CPU affinity
51 394,333 285,352 18,688 205,305 1,345 2,216,469 2,586 2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 859,539 376,786 18,542 349,999 1,434 2,611,307 3,867 2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 948,354 373,531 18,496 366,488 11,256 2,759,065 4,712 2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 957,252 392,683 49,339 393,643 22,446 2,709,301 6,293 2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 963,953 394,036 33,366 393,209 18,353 6,973,762 6,368 2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required, -O4 optimization
17 915,202 376,813 30,659 350,490 17,051 6,824,917 5,943 2023-03-03 - 168 threads, minor improvements, Ubuntu 22.02
17 1,011,928 370,424 30,674 357,605 13,994 6,958,656 5,871 2023-03-10 - 224 threads (8 thread * 28 instances) eventfd, ThreadSmooting, update use when..then
11 1,039,306 362,739 29,363 354,564 15,748 6,959,479 5,964 2023-03-16 - 224 threads (8*28 eft, ts), update with unnest, binary binding
17 1,045,953 362,716 30,896 353,131 16,568 6,994,573 6,060 2023-04-13 - 224 threads (8*28 eft, ts), update using VALUES (),().., removed Connection: Keep-Alive resp header
13 1,109,267 363,671 31,652 352,706 16,897 6,956,038 6,156 2023-04-24 - 224 threads (-s 28 -t8 -p), each server (with all threads) are pinned to the different CPU
7 1,109,693 381,633 32,725 353,182 23,022 6,975,086 6,634 2023-05-13 - 224 threads, added async test in -s 28 -t8 -p mode: db, queries & updates is for async, fortunes for direct
We are #7 even with non-optimal thread/server count for async tests. And #2 in cached-queries
Tomorrow new results are expected - async tests will be executed in `-s 56 -t 1 -p` and `-s 1 -t 56 --nopin`. I am waiting for the results with bated breath..
TFB MR is merged. Results expected at 2023-05-29
No, I'm 0 in python
Env. variable now correctly passed into app container see modified dockerfile
I suggest running it once without a binary array binding to have a basis for comparison..
New MR 8207 based on latest sources and a new test case `-s 1 -t CPU*2 -nopin` is ready.
I use `unnest` pattern for /asyncUpdates (as in prev. MR), because your implementation fails on ?queries=501 test (too many parameters). In /rawupdaes we use if count>20 - use `unnest` else use `select from values`, but unnest works well, IMHO.
I understand why updates is better for async - this is because of less concurrency - in fact on my environment I also got ~23K fro updates..
I will made new PR today with latest sources and new async test-case with `-s 1 -t CPU*4`
And YES - we are in TOP10 now!!!! Congratulations!!!!
I also do not fully understand the numbers, but we have what we have. TFB results for first async implementation should appears at 2023-05-11, after this I'll made a PR with new implementation and one more test case for async, se we will verify both `-s CPU*2 -t 1 -p` and `-s 1 -t CPU*4` cases
Tested new async implementation on 2X Xeon(R) Silver 4214R CPU @ 2.40GHz. Each component limited by taskset to use, first 16 CPU for app, second 16 CPU for db and third 16CPU for wrk - to emulate three TFB servers)
Result is better than initial async implementation (first table row). The best values is still for -s 32 -t 1 -p mode
See table - on google drive
TFB state
Weights 1.000 1.737 21.745 4.077 68.363 0.163
# JSON 1-query 20-q Fortunes Updates Plaintext Scores
38 731,119 308,233 19,074 288,432 3,431 2,423,283 3,486 2022-10-26 - 64 thread limitation
43 320,078 354,421 19,460 322,786 2,757 2,333,124 3,243 2022-11-13 - 112 thread (28CPU*4)
44 317,009 359,874 19,303 324,360 1,443 2,180,582 3,138 2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 563,506 235,378 19,145 246,719 1,440 2,219,248 2,854 2022-12-01 - 112 thread (28CPU*4) CPU affinity
51 394,333 285,352 18,688 205,305 1,345 2,216,469 2,586 2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 859,539 376,786 18,542 349,999 1,434 2,611,307 3,867 2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 948,354 373,531 18,496 366,488 11,256 2,759,065 4,712 2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 957,252 392,683 49,339 393,643 22,446 2,709,301 6,293 2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 963,953 394,036 33,366 393,209 18,353 6,973,762 6,368 2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required, -O4 optimization
17 915,202 376,813 30,659 350,490 17,051 6,824,917 5,943 2023-03-03 - 168 threads, minor improvements, Ubuntu 22.02
17 1,011,928 370,424 30,674 357,605 13,994 6,958,656 5,871 2023-03-10 - 224 threads (8 thread * 28 instances) eventfd, ThreadSmooting, update use when..then
11 1,039,306 362,739 29,363 354,564 15,748 6,959,479 5,964 2023-03-16 - 224 threads (8*28 eft, ts), update with unnest, binary binding
17 1,045,953 362,716 30,896 353,131 16,568 6,994,573 6,060 2023-04-13 - 224 threads (8*28 eft, ts), update using VALUES (),().., removed Connection: Keep-Alive resp header
13 1,109,267 363,671 31,652 352,706 16,897 6,956,038 6,156 2023-04-24 - 224 threads (-s 28 -t8 -p), each server (with all threads) are pinned to the different CPU
Thanks to the CPU ping, we are now #13 (above .NET). Today's round started without a merge.
We hope that our MR with *async* test suite and improved Int64 JSON serialization will be merged in the next round.
It is very likely that we will be in the top 10 (and #1 in cached queries) after that.
BTW - modification to libpq, similar to our is applied by Postgres reviewers and should be included into Postgres v17 (in ~1 year). Next h2o test should also use modified libpq what do not flush on every sync.
I've updated TFB PR 8182 with current (refactored PostgreSQL async DB) sources state - new async test suit added. They usually merging today's (Saturday) night - so we may participate with async in Monday`s run
Just tested current implementation - for a while best results is with `-s CPU*2 -t 1 -p`. /acyncdb and /asyncfortunes is faster (+25%) compared to rawdb/fortunes.
I can wait while you implement `ExecuteAsyncPrepared`, or update TFB PR with current implementation - what is your opinion?
For a while the best /asyncfortunes result I was able to achieve is for servers=CPUCount*2, thread per server =1, pinned
# taskset -c 0-15 ./raw12 -s 32 -t 1
....
num servers=32, threads per server=1, total threads=32, total CPU=48, accessible CPU=16, pinned=TRUE
taskset -c 31-47 ./wrk -H 'Host: 10.0.0.1' -H 'Accept: application/json,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 10 -c 512 --timeout 8 -t 16 "http://localhost:8080/asyncfortunes"
Requests/sec: 482751.02
This is +25%, what is very VERY good, but I almost sure I need to play more with parameters...
And we do not need asoForceConnectionFlush option at all. Even on modified libpq PQGetResult will internally call flush first. In rawqueries pConn.Flush; can also be removed
Please, see this PR
What cmd line parameters do you use to test (threads/servers/pinning)? On my server hw acync* results (with servers=CPUCount threads=8 pinning) is a little worse compared to raw*
num servers=16, threads per server=8, total threads=128, total CPU=48, accessible CPU=16, pinned=TRUE, db=PostgreSQL
taskset -c 31-47 ./wrk -H 'Host: 10.0.0.1' -H 'Accept: application/json,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 10 -c 512 --timeout 8 -t 16 "http://localhost:8080/asyncfortunes"
Requests/sec: 353990.97
taskset -c 31-47 ./wrk -H 'Host: 10.0.0.1' -H 'Accept: application/json,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 10 -c 512 --timeout 8 -t 16 "http://localhost:8080/rawfortunes"
Requests/sec: 393226.26
BTW - nice pictures
About /asyncupdates - from my POV it is not correct to pipeline updates - in realistic /updates scenario we should do all select's together with update in one transaction (even if TFB do not require this). But in our "acync" model we can't (actually can, but from consistency POW it is not correct) do transactions at all - only atomic select operations.
This is why I consider to do /async* endpoints only for db queries and fortunes - I'll create a separate test-case in benchmark_config.json with "approach": "Stripped" for such endpoints.
I'm right, or I miss something?
I'll test it over the weekend or tonight - the server I'm testing on is busy (end of the month - people are creating reports, etc.)
There is missed connect:
--- a/src/db/mormot.db.sql.postgres.pas
+++ b/src/db/mormot.db.sql.postgres.pas
@@ -1379,6 +1379,7 @@ begin
fProperties := Owner;
fStatements := TSynObjectListLightLocked.Create;
fConnection := fProperties.NewConnection as TSqlDBPostgresConnection;
+ fConnection.Connect;
fConnection.EnterPipelineMode;
end;
And should be tested, because currently result is always `{"id":0,"randomNumber":0}` and only 19 RPS per server
Looking forward to it! We still have at least 7 days until the next merge request...
I know this cool article about async in .NET. In fact, the same steps were taken in JS. In browser client for UnityBase I started with callbacks 13 years ago, then moved to iterator-based Promises poly-fill, then to Promises and finally - to async/await.
In Pascal we need at least iterator support on compiler level, without this the only option is callbacks, but this is hell.... Callback-based implementation example is h2o
I like our current implementation - at the app level, everything is quite simple. Complicating it to the level of manual implementation of asynchronization is likely to alienate potential users.
I'm still confident that we can find a way to improve the current implementation (and I'm working on it periodically) - we only need +200 composite points to get into the top 10 TFB...
TFB PR 8182 is ready - should improve /cached-queries and may be /queries also.
We can elso avoid tmp and MoveFast, isnt't it? At least in writer.Add.
While looking on cached-queries performance I found *VERY* unexpected thing:
TTextWriter.Add(Value: PtrInt) uses fast lookup table for values < 999.
I decide to increase it to 9999 (TFB ID's are 0..10000) and..... performance has gotten worse
If I comment lookup code - performance increases.
For cached-queries?count=100:
- no lookup: 511k RPS
- 999 lookup size: 503k RPS
- 9999 lookup size: 466k RPS
@ab - do you have any ideas why so? Relative numbers do not depends on CPU pinning, server count, thread count.....
About POrmCacheTable of course we could put it as a field. But I doubt it would make any performance change: it is a O(1) lookup process.
It called 400k times per second. Caching can give us +0.1% performance boots we need to be #1... At least in my environment, this is happening.
Unfortunately, rawcached is brake a rules. There is already discussions in TFB issues what such implementations should be banned - I don't want to take risks.
About pipelining DB requests for /db and /fortunes - this is interesting idea. Actually top rated frameworks did such..
In this case we need
- callback on HTTP server level and
- callback on DB level
Each server can use single per-server DB connection and new method on DB layer stmt.ExecutePipelining(maxCnt, timeout, callback);
stmt.ExecutePipelining can do buffering up to maxCnt statements or until timeout, run they in single pipeline and notify callback for each caller.
And finally we got a callback hell (especially while handling exceptions) - I've seen this in old .NET and JavaScript before they implemented async/await at the runtime level.
But for benchmark purpose we can try
In current round we moved above actix and .NETCore
The final results will be in 3 days, I expect we will be #15
@ab - is it correct to calc POrmCacheTable once in TRawAsyncServer constructor (instead of calc it every time here)? This should give us +few request we need to be #1 in cached queries...
Current TFB status
Weights 1.000 1.737 21.745 4.077 68.363 0.163
# JSON 1-query 20-q Fortunes Updates Plaintext Scores
38 731,119 308,233 19,074 288,432 3,431 2,423,283 3,486 2022-10-26 - 64 thread limitation
43 320,078 354,421 19,460 322,786 2,757 2,333,124 3,243 2022-11-13 - 112 thread (28CPU*4)
44 317,009 359,874 19,303 324,360 1,443 2,180,582 3,138 2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 563,506 235,378 19,145 246,719 1,440 2,219,248 2,854 2022-12-01 - 112 thread (28CPU*4) CPU affinity
51 394,333 285,352 18,688 205,305 1,345 2,216,469 2,586 2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 859,539 376,786 18,542 349,999 1,434 2,611,307 3,867 2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 948,354 373,531 18,496 366,488 11,256 2,759,065 4,712 2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 957,252 392,683 49,339 393,643 22,446 2,709,301 6,293 2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 963,953 394,036 33,366 393,209 18,353 6,973,762 6,368 2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required, -O4 optimization
17 915,202 376,813 30,659 350,490 17,051 6,824,917 5,943 2023-03-03 - 168 threads, minor improvements, Ubuntu 22.02
17 1,011,928 370,424 30,674 357,605 13,994 6,958,656 5,871 2023-03-10 - 224 threads (8 thread * 28 instances) eventfd, ThreadSmooting, update use when..then
11 1,039,306 362,739 29,363 354,564 15,748 6,959,479 5,964 2023-03-16 - 224 threads (8*28 eft, ts), update with unnest, binary binding
17 1,045,953 362,716 30,896 353,131 16,568 6,994,573 6,060 2023-04-13 - 224 threads (8*28 eft, ts), update using VALUES (),().., removed Connection: Keep-Alive resp header
We still #17, but Composite scores improves for every new run. Also we moved up form #7 to #3 in cached-queries test
Now we tries with CPU pinning - I expect good improvement in /json and /cached-queries....
Today I run TFB tests 5 times (each run takes ~30 minutes) and memory error not occurs (with old sources, without GetAsText), so it's really a hisenbug.
It occurs NOT on server shutdown, but just after wrk command ends, I think - when sockets are closing... Will continue to investigate...
About command line parameters - nice code. Please - look at PR 176 - I made a more Unix-way formatting of help message
HTTP pipelining fixed - thanks! I made a TFB PR 8153 with CPU pinning - let's wait for results.
Memory problems still exists. Today I catch it twice (from 5-6 runs) - once after /db and once - after /rawqueries while running
./tfb --test mormot mormot-postgres-raw --query-levels 20 -m benchmark
Still can't reproduce in more "debuggable' way
Also synced my latest changes to TFB with ex/techempower-bench/raw.pas - see PR 175 for mORMot2
@ab - HTTP pipelining is currently broken. Introduced by feature "added Basic and Digest auth".
Last good commit is [1434d3e1] prepare HTTP server authentications - 2023-04-13 1:48. After that series of commits what not compiles die to new param aAuthorize for THttpServerRequestAbstract.Prepare, and first commit what compiles responds only for first pipeline request.
Can be verified using console commad below - should return 2 Hello, World!
(echo -en "GET /plaintext HTTP/1.1\nHost: foo.com\nConnection: keep-alive\n\nGET /plaintext HTTP/1.1\nHost: foo.com\n\n"; sleep 10) | telnet localhost 8080
@ttomas - thanks for idea - added CONN param into gist - a connection count for wrk, for plaintext 1024 is used (all fw shows best results for 1024)
@ab - I add shebang to gist (first line) - may be your default shell is not bash.. Also ensure you have `bc` utility (apt install bc)
Nice to head what our measurement with pinning match now... I do not understand why in your case json is better than plaintext - in my case plaintext is always better.
I will made PR to TFB on Sunday (when current run result for mormot appears) - we can see what pinning give us on real hardware.. BTW pinning is a common practice for acync servers - even nginx have worker affinity option in config. In TFB tests pinning is used at least by libreactor and H2O
W/O pipelining (with cmem) results are (node 100 for cached queries - as in TFB test):
/json /plaintext /cached-queries?count=100
pinning 1,281,204 1,301,311 493,913
default 1,088,939 1,168,009 471,235
I put program, I use to create load for smoke tests in this gist. CORES2USE and CORES2USE_COUNT should be edited to match CPUs used by wrk