You are not logged in.
Verified with 2.2.7351 - now escaping is OK. TFB MR #8883 is ready.
I don't bother with mimalloc either - we'll see how it goes with the current improvements.
@ab - it seems escaping is broken in latest Mustache implementation
Valid fortune resp should be (<script>)
<tr><td>11</td><td><script>alert("This should not be displayed in a browser alert box.");</script></td></tr>
but current implementation do not escape {{ }} template and result is
<tr><td>11</td><td><script>alert("This should not be displayed in a browser alert box.");</script></td></tr>
Yes, disabling sleep should help for the asynchronous server, because according to dstat (now it's running) the processor is idle ~66% (only 40% is used). By the way, for a raw, idle is ~44%, which is also too high, for example, in h2o idle is ~20%.
In today's result based on mORMot2.2.stable, our score increased from 18,281 to 19,303, I hope we will be #5.
I'll test fresh sources on my computer (my test server is in a region with power outages and is unavailable) and do a MR (hope today)
One more observation is that may-minihttp, ntex, and xitca-web use mimalloc. I also plan to try to use it in one of the next rounds. I've already tried mimalloc on my server - performance is the same as with libc alocator, but on the new TFB hardware it might change the numbers a bit.
P.S.
#6 is a good place IMHO
Currently, with 56 CPUs, mormot [async,nopin] (1 process, 112 threads, no pinning) is good for updates, but bad for other scenarios (compared to async with pinning and direct pinning).
Let's wait for the next launch with the latest PR (the current launch is based on the same sources as Round22).
The bad thing is that we don't have a test server that matches the TFB configuration now, so we can't test different thread/server/pin combinations to find the best one.
Thanks a lot! BTW new binaries can be released with SQLite3 3.45.1 (where JSON and read performance improvements are introduced)
To solve a problem, described in #211 ( `GLIBC_2.34' not found ) I compile a production version of my product using docker with fpc what based on Ubuntu 20.04 (GLIBC 2.31)
I found this commit in sqlite3 build what replace all symbols pthread->_pthread + dl->_dl and issue #211
To solve my problem I do a backward replacement, and now linker is happy.
Linux script what do a replacing is here
Still do not understand why it compiles in mORMot2.
I'm confused a little, because the same sqlite3.o is used in mormot2.2_stable and TFB example compiles well. Any help is wellcome
@ab - I'm trying to upgrade SQLite3 in my mORMot1 application from 3.31.0 to 3.44.2 (to the latest static version uploaded to the mORMot repository), but I get various linking errors related to pthread and dlopen (sorry for big fragment) under x86_64-linux (win64 is OK)
(9015) Linking bin/fpc-linux/ub
/usr/bin/ld: libs/Synopse/./static/x86_64-linux/sqlite3.o: in function `sqlite3_strlike':
sqlite3mc.c:(.text+0xc735): undefined reference to `_pthread_mutex_trylock'
/usr/bin/ld: libs/Synopse/./static/x86_64-linux/sqlite3.o: in function `sqlite3_free_filename':
sqlite3mc.c:(.text+0xe345): undefined reference to `_pthread_mutex_destroy'
/usr/bin/ld: libs/Synopse/./static/x86_64-linux/sqlite3.o: in function `sqlite3_create_filename':
sqlite3mc.c:(.text+0x20bbc): undefined reference to `_pthread_create'
/usr/bin/ld: libs/Synopse/./static/x86_64-linux/sqlite3.o: in function `sqlite3_uri_int64':
sqlite3mc.c:(.text+0x2bbc4): undefined reference to `_pthread_mutexattr_init'
/usr/bin/ld: sqlite3mc.c:(.text+0x2bbd1): undefined reference to `_pthread_mutexattr_settype'
/usr/bin/ld: sqlite3mc.c:(.text+0x2bbdc): undefined reference to `_pthread_mutex_init'
/usr/bin/ld: sqlite3mc.c:(.text+0x2bbe4): undefined reference to `_pthread_mutexattr_destroy'
/usr/bin/ld: sqlite3mc.c:(.text+0x2bc30): undefined reference to `_pthread_mutex_init'
/usr/bin/ld: sqlite3mc.c:(.text+0x2c7ce): undefined reference to `_pthread_join'
/usr/bin/ld: libs/Synopse/./static/x86_64-linux/sqlite3.o: in function `sqlite3_load_extension':
sqlite3mc.c:(.text+0x4246e): undefined reference to `_dlerror'
/usr/bin/ld: libs/Synopse/./static/x86_64-linux/sqlite3.o: in function `sqlite3_compileoption_used':
sqlite3mc.c:(.text+0xb804): undefined reference to `_dlclose'
/usr/bin/ld: sqlite3mc.c:(.text+0xb817): undefined reference to `_dlsym'
/usr/bin/ld: sqlite3mc.c:(.text+0xb829): undefined reference to `_dlopen'
/usr/bin/ld: libs/Synopse/./static/x86_64-linux/sqlite3.o: in function `sqlite3_strlike':
sqlite3mc.c:(.text+0xc721): undefined reference to `_pthread_mutex_unlock'
/usr/bin/ld: sqlite3mc.c:(.text+0xc751): undefined reference to `_pthread_mutex_lock'
UB.lpr(56,1) Error: (9013) Error while linking
I tried under Ubuntu 22.04 (GLIBC 2.35) and Ubuntu 20.04 (GLIBC 2.31) with the same results..
I assume that you compile Linux static under newer OSes (newer libc) and this is the reason?
There is update with new TFB environment https://github.com/TechEmpower/Framewor … 1973835104
56 logical CPU and 40Gb network.
Still waiting for test to be resumed....
I doubt what is being measured - 1024 loop is too small.
I increase loop to be measured
for (i = 0; i < 1024*1024*1024; i++)
and create 2 program - one what call library from c, and one - from pascal (as you provide) - the results is the same
$ ./forloop_c.out
space 129340 clock_t value -536870912
$ ./forloop_pas
space 129318 clock_t value -536870912
After almost a year of discussions our idea for PostgreSQL pipe-lining mode improvement is merged into PostgreSQL upstream. After Postgres 17 release on September 2024 we will switch to it - for a while I do not want to increase a container build time by adding a libpq compilation from sources
It is not easy here, in Ukraine, but this winter is so far easier than the previous one - we were ready for the russian terror.
As for TFB, let's look at the new HW. It's a shame that our last commit didn't play on the old HW - we don't have a base for comparison. After one run on the new HW, I plan to switch our test to mormot@2.2 (and possibly change threads/servers). Life goes on..
Good idea
For my part, I'll try to use `mimalloc` in the next run - the top rust frameworks `ntex` (made by MS, by the way), `may-minihttp`, `xitca-web` use mimalloc. On my computer, mimalloc works a little worse, but on FTB the situation may change
PR 8612 is ready.
/cached-queries memory issues occurred too rarely (once per 10 TFB run) - I also can`t reproduce it on my environment
As for the performance of single select, I'm afraid that the only way to improve it is to switch the Postgres connection to non-blocking mode (PQsetnonblocking).
But IMHO this requires a huge rework of the framework to become event driven:
- we need a .NET-like pool of database connections (with minimum and maximum size) that is not based on the thread ID (threadSafeConnection), but manages a list of all connections to one database and can provide the user with an unoccupied connection (or block if all connections are busy)
- Ideally, the database connection sockets waiting for the result should be in the same epool as HTTP sockets, so we need a callback-based event loop backed by an epoll (like in libuv). This is a complete redesign of the current HTTP server architecture
Another IMHO:
An event-driven, callback-based architecture is the only choice we have with FPC. But it's a road to callback hell. I know very well what this means because I worked with callbacks in early JS for 5 years, up until Promises. Any complex logic based on callbacks is hell.
So my suggestion is to stay where we are. After all, we are now in the top 3 for ORM, and our code is production-ready unlike many others on TOP
Yes, I'll do it on this weekend
Nice finding, for shure. Sometimes Pascal is unsafe....
mORMot links to `libc`, but alpine image uses `musl` instead (this is not an old libc - this is another implementation of libc), so yes - this is not a Docker problem, but problem with running on alpine docker image.
See https://wiki.alpinelinux.org/wiki/Runni … c_programs about how to run on alpine if it is absolutely needed.
My opinion - do not use alpine at all - it creates too many problems, the gain of a few MB of disk space is not worth it.
mormot2 runs perfectly in container, we did it in TFB benchmark for a long time. Most likely problem because of alpine image, this image is based on truncated libc. Try with ubuntu or something with fulll libc
No, only connections per thread (operating system thread ID). Technically, this is not limited to HTTP server threads - your application can create its own threads.
Official Round 22 is completed and will be published soon. We are #12 in composite scores. Unfortunately cached-queries is not completed (server crashes with double free). So some memory problem is still exists. We are lucky what cached-queries not counted in composite section...
@ab - please, see https://github.com/TechEmpower/Framewor … ssues/8501 - maybe you have the inspiration to write some blog article?
Thanks! I back-port it to mORMot1 - see https://github.com/synopse/mORMot/pull/446
While reading log files using Rust, I discover what some of them is not a UTF8 text. This happens in case we truncate a long strings using `TextTruncateAtLength` parameter, and writer break a string inside a surrogate pair (in my case I logs an input JSON what contains a non latin strings).
IMHO good place is to fix this inside a logger here https://github.com/synopse/mORMot2/blob … .pas#L5531 (not inside writer for compatibility). Good solution is to tries step forward a little (for max of 3 char, because string can be actual a binary) and found actual character end.
Or I should fix this on my app level?
We have a similar "reporting" service.
We have a reverse proxy what proxies all "fast" requests such as logging in and editing records to one instance, all "report" requests - to another.
First we scale our "report" instance by increasing threadPoolSize.
When this was not enough - by adding hosts + load balancer on reverse proxy level.
After the database became a bottleneck, we change the user API - some "reports" requests now processed to clients synchronously, but if we detect what calculating of the response MAY take a long time (by using some kind of heuristic) we answer to client with requestID, put request into durable queue (database table with row locking for a while, but will migrate to redis).
We have a pool of workers what pull a queue, process requests using database replicas, and put responses back into queue.
Client use pooling to detect what "report" for their requestID is ready.
Currently our load is ~100 "report" per second. Some reports are generated in 0.1 seconds, some - up to 20 minutes (hundreds of pages)
The simplest is to increase a threadPoolSize to 32 or 64
Let's keep source as is for a while - in any case, they will not be merged until Round 22.
I expect what mormot [async] demonstrate the same updates results as in run from 2023-05-20 and we'll be #7, because all conditions is the same from our side, but we have what we have
At least there will be something to work on in Round 23
Modern Linux distribution came with OpenSSL3x (RHE/OEL 9, Cenos stream etc.), It's even not possible to install OpenSSL1,1 (only using compat-openssl11 package). So in most case we will use 3.x. Hope performance issues will be fixed in 3.0 also.
https://tfb-status.techempower.com/ is up and running, but the current results are very strange - almost all frameworks (including mormot) lost up to 20% of RPS in /json and /plaintext, except two (with no changes in sources). It looks like "All animals are equal, but some are more equal than others" (hope I'm wrong)
For RabbitMQ there is a Habari STOMP Client - in latest releases it based on mORMot: https://forum.lazarus.freepascal.org/in … #msg470786
For Redis I wrote a simple TCrtSocket (mORMot1) based client to use in UnityBase - you can take it here (classes TRedisConnectionsManager & TRedisClient not depends on SpiderMonkey): https://git-pub.intecracy.com/unitybase … gredis.pas
Official TFB Round 22 expected in August 2023. I expect we will be #7 in composite score rating.
BTW in last run we #1 in Cached queries - see https://www.techempower.com/benchmarks/ … ched-query
Headers delimirer is CRLF (#13#10), not LF(#10) as in your example.
It is a very "hot" summer in my country, something burns every day (the occupiers, thank God, have more).
I decide to set [async] test back to `-s 28 -t 8 -p` mode - this gives us a best results for `/update` (~23k), even if we do not fully understand a reason.
See PR 2803
Perhaps I will return to optimization closer to September
Replace your OleDBConnectionProperties with TODBCConnectionProperties (mORMot1) / TSqlDBOdbcConnectionProperties (mORMot2) - see Create method docs for examples
Please, note what OleDB is deprecated, see https://learn.microsoft.com/en-us/sql/r … rver-ver16
We use ODBC on both nix and Windows last year, works OK
The state of async summarized form last 2 runs:
db queries fortunes updates
sync
-s28 -t8 -p 362,787 31,436 353,182 16,750
async
-s28 -t8 -p 381,633 32,725 290,512 23,022
-s56 -t1 -p 398,619 32,892 312,836 20,154
-s1 -t56 -nop 371,371 31,766 330,803 19,995
I made a new PR 8232 with changed [async] to -s28 -t4 -p, the same [async,nopin] and binary array binding. After receiving the results, we will be able to choose the best param's for async
TFB state
Weights 1.000 1.737 21.745 4.077 68.363 0.163
# JSON 1-query 20-q Fortunes Updates Plaintext Scores
38 731,119 308,233 19,074 288,432 3,431 2,423,283 3,486 2022-10-26 - 64 thread limitation
43 320,078 354,421 19,460 322,786 2,757 2,333,124 3,243 2022-11-13 - 112 thread (28CPU*4)
44 317,009 359,874 19,303 324,360 1,443 2,180,582 3,138 2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 563,506 235,378 19,145 246,719 1,440 2,219,248 2,854 2022-12-01 - 112 thread (28CPU*4) CPU affinity
51 394,333 285,352 18,688 205,305 1,345 2,216,469 2,586 2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 859,539 376,786 18,542 349,999 1,434 2,611,307 3,867 2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 948,354 373,531 18,496 366,488 11,256 2,759,065 4,712 2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 957,252 392,683 49,339 393,643 22,446 2,709,301 6,293 2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 963,953 394,036 33,366 393,209 18,353 6,973,762 6,368 2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required, -O4 optimization
17 915,202 376,813 30,659 350,490 17,051 6,824,917 5,943 2023-03-03 - 168 threads, minor improvements, Ubuntu 22.02
17 1,011,928 370,424 30,674 357,605 13,994 6,958,656 5,871 2023-03-10 - 224 threads (8 thread * 28 instances) eventfd, ThreadSmooting, update use when..then
11 1,039,306 362,739 29,363 354,564 15,748 6,959,479 5,964 2023-03-16 - 224 threads (8*28 eft, ts), update with unnest, binary binding
17 1,045,953 362,716 30,896 353,131 16,568 6,994,573 6,060 2023-04-13 - 224 threads (8*28 eft, ts), update using VALUES (),().., removed Connection: Keep-Alive resp header
13 1,109,267 363,671 31,652 352,706 16,897 6,956,038 6,156 2023-04-24 - 224 threads (-s 28 -t8 -p), each server (with all threads) are pinned to the different CPU
7 1,109,693 381,633 32,725 353,182 23,022 6,975,086 6,634 2023-05-13 - 224 threads, added async test in -s 28 -t8 -p mode: db, queries & updates is for async, fortunes for direct
We are #7 even with non-optimal thread/server count for async tests. And #2 in cached-queries
Tomorrow new results are expected - async tests will be executed in `-s 56 -t 1 -p` and `-s 1 -t 56 --nopin`. I am waiting for the results with bated breath..
TFB MR is merged. Results expected at 2023-05-29
No, I'm 0 in python
Env. variable now correctly passed into app container see modified dockerfile
I suggest running it once without a binary array binding to have a basis for comparison..
New MR 8207 based on latest sources and a new test case `-s 1 -t CPU*2 -nopin` is ready.
I use `unnest` pattern for /asyncUpdates (as in prev. MR), because your implementation fails on ?queries=501 test (too many parameters). In /rawupdaes we use if count>20 - use `unnest` else use `select from values`, but unnest works well, IMHO.
I understand why updates is better for async - this is because of less concurrency - in fact on my environment I also got ~23K fro updates..
I will made new PR today with latest sources and new async test-case with `-s 1 -t CPU*4`
And YES - we are in TOP10 now!!!! Congratulations!!!!
I also do not fully understand the numbers, but we have what we have. TFB results for first async implementation should appears at 2023-05-11, after this I'll made a PR with new implementation and one more test case for async, se we will verify both `-s CPU*2 -t 1 -p` and `-s 1 -t CPU*4` cases
Tested new async implementation on 2X Xeon(R) Silver 4214R CPU @ 2.40GHz. Each component limited by taskset to use, first 16 CPU for app, second 16 CPU for db and third 16CPU for wrk - to emulate three TFB servers)
Result is better than initial async implementation (first table row). The best values is still for -s 32 -t 1 -p mode
See table - on google drive
TFB state
Weights 1.000 1.737 21.745 4.077 68.363 0.163
# JSON 1-query 20-q Fortunes Updates Plaintext Scores
38 731,119 308,233 19,074 288,432 3,431 2,423,283 3,486 2022-10-26 - 64 thread limitation
43 320,078 354,421 19,460 322,786 2,757 2,333,124 3,243 2022-11-13 - 112 thread (28CPU*4)
44 317,009 359,874 19,303 324,360 1,443 2,180,582 3,138 2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 563,506 235,378 19,145 246,719 1,440 2,219,248 2,854 2022-12-01 - 112 thread (28CPU*4) CPU affinity
51 394,333 285,352 18,688 205,305 1,345 2,216,469 2,586 2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 859,539 376,786 18,542 349,999 1,434 2,611,307 3,867 2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 948,354 373,531 18,496 366,488 11,256 2,759,065 4,712 2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 957,252 392,683 49,339 393,643 22,446 2,709,301 6,293 2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 963,953 394,036 33,366 393,209 18,353 6,973,762 6,368 2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required, -O4 optimization
17 915,202 376,813 30,659 350,490 17,051 6,824,917 5,943 2023-03-03 - 168 threads, minor improvements, Ubuntu 22.02
17 1,011,928 370,424 30,674 357,605 13,994 6,958,656 5,871 2023-03-10 - 224 threads (8 thread * 28 instances) eventfd, ThreadSmooting, update use when..then
11 1,039,306 362,739 29,363 354,564 15,748 6,959,479 5,964 2023-03-16 - 224 threads (8*28 eft, ts), update with unnest, binary binding
17 1,045,953 362,716 30,896 353,131 16,568 6,994,573 6,060 2023-04-13 - 224 threads (8*28 eft, ts), update using VALUES (),().., removed Connection: Keep-Alive resp header
13 1,109,267 363,671 31,652 352,706 16,897 6,956,038 6,156 2023-04-24 - 224 threads (-s 28 -t8 -p), each server (with all threads) are pinned to the different CPU
Thanks to the CPU ping, we are now #13 (above .NET). Today's round started without a merge.
We hope that our MR with *async* test suite and improved Int64 JSON serialization will be merged in the next round.
It is very likely that we will be in the top 10 (and #1 in cached queries) after that.
BTW - modification to libpq, similar to our is applied by Postgres reviewers and should be included into Postgres v17 (in ~1 year). Next h2o test should also use modified libpq what do not flush on every sync.
I've updated TFB PR 8182 with current (refactored PostgreSQL async DB) sources state - new async test suit added. They usually merging today's (Saturday) night - so we may participate with async in Monday`s run
Just tested current implementation - for a while best results is with `-s CPU*2 -t 1 -p`. /acyncdb and /asyncfortunes is faster (+25%) compared to rawdb/fortunes.
I can wait while you implement `ExecuteAsyncPrepared`, or update TFB PR with current implementation - what is your opinion?
For a while the best /asyncfortunes result I was able to achieve is for servers=CPUCount*2, thread per server =1, pinned
# taskset -c 0-15 ./raw12 -s 32 -t 1
....
num servers=32, threads per server=1, total threads=32, total CPU=48, accessible CPU=16, pinned=TRUE
taskset -c 31-47 ./wrk -H 'Host: 10.0.0.1' -H 'Accept: application/json,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 10 -c 512 --timeout 8 -t 16 "http://localhost:8080/asyncfortunes"
Requests/sec: 482751.02
This is +25%, what is very VERY good, but I almost sure I need to play more with parameters...