Posts by mpv

mpv · mORMot 2

ab wrote:

About POrmCacheTable of course we could put it as a field. But I doubt it would make any performance change: it is a O(1) lookup process.

It called 400k times per second. Caching can give us +0.1% performance boots we need to be #1... At least in my environment, this is happening.

Unfortunately, rawcached is brake a rules. There is already discussions in TFB issues what such implementations should be banned - I don't want to take risks.

About pipelining DB requests for /db and /fortunes - this is interesting idea. Actually top rated frameworks did such..
In this case we need
- callback on HTTP server level and
- callback on DB level
Each server can use single per-server DB connection and new method on DB layer stmt.ExecutePipelining(maxCnt, timeout, callback);
stmt.ExecutePipelining can do buffering up to maxCnt statements or until timeout, run they in single pipeline and notify callback for each caller.

And finally we got a callback hell (especially while handling exceptions) - I've seen this in old .NET and JavaScript before they implemented async/await at the runtime level.
But for benchmark purpose we can try

mpv · mORMot 2

In current round we moved above actix and .NETCore
The final results will be in 3 days, I expect we will be #15

@ab - is it correct to calc POrmCacheTable once in TRawAsyncServer constructor (instead of calc it every time here)? This should give us +few request we need to be #1 in cached queries...

mpv · mORMot 2

Current TFB status

Weights	1.000	1.737	21.745	4.077	68.363	0.163
#	JSON	1-query	20-q   Fortunes Updates Plaintext  Scores
38 	731,119	308,233	19,074	288,432	3,431	2,423,283  3,486  2022-10-26 - 64 thread limitation
43 	320,078	354,421	19,460	322,786	2,757	2,333,124  3,243  2022-11-13 - 112 thread (28CPU*4)	
44 	317,009	359,874	19,303	324,360	1,443	2,180,582  3,138  2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 	563,506	235,378	19,145	246,719	1,440	2,219,248  2,854  2022-12-01 - 112 thread (28CPU*4) CPU affinity	
51 	394,333	285,352	18,688	205,305	1,345	2,216,469  2,586  2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 	859,539	376,786	18,542	349,999	1,434	2,611,307  3,867  2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 	948,354	373,531	18,496	366,488	11,256	2,759,065  4,712  2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 	957,252	392,683	49,339	393,643	22,446	2,709,301  6,293  2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 	963,953	394,036	33,366	393,209	18,353	6,973,762  6,368  2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required,  -O4 optimization
17      915,202	376,813	30,659	350,490	17,051	6,824,917  5,943  2023-03-03 - 168 threads, minor improvements, Ubuntu 22.02
17    1,011,928	370,424	30,674	357,605	13,994	6,958,656  5,871  2023-03-10 - 224 threads (8 thread * 28 instances) eventfd, ThreadSmooting, update use when..then
11    1,039,306	362,739	29,363	354,564	15,748	6,959,479  5,964  2023-03-16 - 224 threads (8*28 eft, ts), update with unnest, binary binding
17    1,045,953	362,716	30,896	353,131	16,568	6,994,573  6,060  2023-04-13 - 224 threads (8*28 eft, ts), update using VALUES (),().., removed Connection: Keep-Alive resp header

We still #17, but Composite scores improves for every new run. Also we moved up form #7 to #3 in cached-queries test
Now we tries with CPU pinning - I expect good improvement in /json and /cached-queries....

mpv · mORMot 2

Today I run TFB tests 5 times (each run takes ~30 minutes) and memory error not occurs (with old sources, without GetAsText), so it's really a hisenbug.
It occurs NOT on server shutdown, but just after wrk command ends, I think - when sockets are closing... Will continue to investigate...

About command line parameters - nice code. Please - look at PR 176 - I made a more Unix-way formatting of help message

mpv · mORMot 2

HTTP pipelining fixed - thanks! I made a TFB PR 8153 with CPU pinning - let's wait for results.

Memory problems still exists. Today I catch it twice (from 5-6 runs) - once after /db and once - after /rawqueries while running

./tfb --test mormot mormot-postgres-raw --query-levels 20 -m benchmark

Still can't reproduce in more "debuggable' way

Also synced my latest changes to TFB with ex/techempower-bench/raw.pas - see PR 175 for mORMot2

mpv · mORMot 2

@ab - HTTP pipelining is currently broken. Introduced by feature "added Basic and Digest auth".

Last good commit is [1434d3e1] prepare HTTP server authentications - 2023-04-13 1:48. After that series of commits what not compiles die to new param aAuthorize for THttpServerRequestAbstract.Prepare, and first commit what compiles responds only for first pipeline request.
Can be verified using console commad below - should return 2 Hello, World!

(echo -en "GET /plaintext HTTP/1.1\nHost: foo.com\nConnection: keep-alive\n\nGET /plaintext HTTP/1.1\nHost: foo.com\n\n"; sleep 10) | telnet localhost 8080

mpv · mORMot 2

@ttomas - thanks for idea - added CONN param into gist - a connection count for wrk, for plaintext 1024 is used (all fw shows best results for 1024)

@ab - I add shebang to gist (first line) - may be your default shell is not bash.. Also ensure you have `bc` utility (apt install bc)
Nice to head what our measurement with pinning match now... I do not understand why in your case json is better than plaintext - in my case plaintext is always better.

I will made PR to TFB on Sunday (when current run result for mormot appears) - we can see what pinning give us on real hardware.. BTW pinning is a common practice for acync servers - even nginx have worker affinity option in config. In TFB tests pinning is used at least by libreactor and H2O

mpv · mORMot 2

W/O pipelining (with cmem) results are (node 100 for cached queries - as in TFB test):

              /json          /plaintext      /cached-queries?count=100
pinning      1,281,204          1,301,311         493,913
default      1,088,939          1,168,009         471,235

I put program, I use to create load for smoke tests in this gist. CORES2USE and CORES2USE_COUNT should be edited to match CPUs used by wrk

mpv · mORMot 2

Actually json is not x4 slower, because plaintext is pipelining with 16 HTTP requests in one package, so there is 7000000/16 packages, and performance is limited by 10G network.
I analyse json valgrind many times and currently do not see any possible improvements, except minimizing a cpu-migrations and conttext-switch'es using CPU pinning.
Your results is strange for me.. Did you try to use first 10 CPU for app and second 10 for wrk ? And please, check what you use a cmem.

mpv · mORMot 2

TFB hardware is 1 socket CPU....
I run tests on 48 cores server (2 sockets * 24 cores each) using

taskset -c 0-15 ./raw
 num thread=8, total CPU=48, accessible CPU=16, num servers=16, pinned=TRUE, total workers=128, db=PostgreSQL

Postgres is limited to cores 15-31 by adding systemd-dropin /etc/systemd/system.control/postgresql.service.d/50-AllowedCPUs.conf with content

[Service]
AllowedCPUs=15-31

and wrk limited to last 16 cores

taskset -c 31-47 ./wrk

In this case results are

json                    1,207,744
rawdb                     412,057
rawfortunes               352,382
rawqueries?queries=20      48,465
cached-queries?count=100     483,290
db                        376,684
queries?queries=20         32,878
updates?queries=20         22,016
fortunes                  300,411
plaintext               3,847,097

while the same without pinning are

json                    1,076,755
rawdb                     409,145
rawfortunes               359,764
rawqueries?queries=20      47,887
cached-queries?count=100     456,215
db                        395,335
queries?queries=20         33,542
updates?queries=20         22,148
fortunes                  306,237
plaintext               3,838,749

There is a small degradation in db-related tests, but composit scores is better. I plane to check pinning on TFB hardware and decide what to do - depending on results. We can, for example, create separate docker file with pinning for non-db endpoints and w/o pinning for db related (as @ttomas propose)

mpv · mORMot 2

Added CPU pinning feature to the TFB example - see mORMot2 PR #172. I will do the same PR for TFB after got results for next (should starts on 2023-04-13) run (with new update algo and removed keep-alive header)
In this PR I add accessible CPU analyzing - for testing purpose, when we limit CPUs using `taskset`..

About memory error - unfortunately this is all I currently have. If I enable logging it not reproduced, currently reproduced ONLY during `./tfb --test mormot --query-levels 20 -m benchmark`. but not for every run

mpv · mORMot 2

I found we do not initialize global flags variable in raw.pas - may be unexpected flags is added and this is a reason of our memory problems.... Will fix it in next MR (to both TFB and mORMot).

Also I verify my new idea - we create 28 servers with 8 threads each, I binds all threads for each server to the same CPU and on my hardware it gives 1 002K -> 1200k boots for /json. Please, give me access to TAsyncConnections.fThreads - MR #171 - it's allow me to set affinity mask from TFB test program

mpv · mORMot 2

Memory problem is reproduced when I run `./tfb --test mormot --query-levels 20 -m benchmark` - not for every run, randomly. And in random places..

I hadn't seen it before 2023-03-08 [46f5360a66], first time It appears when I checkout to commit [2ae346fe11b] (2023-03-14), so it introduced somewhere in-between 08-14 March

I'll try to come closer to a faulty commit using bisect technique, but this is long process.....

mpv · mORMot 2

Thanks! I update TFB MR. Will sync all changes back in mormot after new update algo will be verified.

Starting from commit [2ae346fe11b91fbe6fa1945cf535abed3de99d37] (Mar 14, 2023) I observe memory problems.

Occurs randomly (sometimes after /db, sometimes after /json), but always after wrk session is finished (on sockets closing?)

I can't reproduce it in normal execution, only during tfb --bencmark.
Also reproduced once 2023-03-30 on TFB environment - this is why this round not contains cached-queries results

glibc MM messages are:
- corrupted size vs. prev_size while consolidating
- double free or corruption (!prev)

mpv · mORMot 2

I modify my prev. post after round finished - we are #16 (all frameworks returns back to rating)

@ab - I found what here we add a `Connection: Keep-Alive` header for HTTP 1.1. This is not necessary - by default HTTP 1.1 is keep alive.
So, I propose to replace

result^.AppendShort('Connection: Keep-Alive'#13#10#13#10);

by

result^.AppendCRLF;

Or, if you preferring, add an option for this.

I checked - such replacement works correctly and improve plaintext performance (may be we even got beautiful 7M req/sec on TFB hardware)

mpv · mORMot 2

Current TFB results

Weights	1.000	1.737	21.745	4.077	68.363	0.163
#	JSON	1-query	20-q   Fortunes Updates Plaintext  Scores
38 	731,119	308,233	19,074	288,432	3,431	2,423,283  3,486  2022-10-26 - 64 thread limitation
43 	320,078	354,421	19,460	322,786	2,757	2,333,124  3,243  2022-11-13 - 112 thread (28CPU*4)	
44 	317,009	359,874	19,303	324,360	1,443	2,180,582  3,138  2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 	563,506	235,378	19,145	246,719	1,440	2,219,248  2,854  2022-12-01 - 112 thread (28CPU*4) CPU affinity	
51 	394,333	285,352	18,688	205,305	1,345	2,216,469  2,586  2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 	859,539	376,786	18,542	349,999	1,434	2,611,307  3,867  2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 	948,354	373,531	18,496	366,488	11,256	2,759,065  4,712  2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 	957,252	392,683	49,339	393,643	22,446	2,709,301  6,293  2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 	963,953	394,036	33,366	393,209	18,353	6,973,762  6,368  2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required,  -O4 optimization
17      915,202	376,813	30,659	350,490	17,051	6,824,917  5,943  2023-03-03 - 168 threads, minor improvements, Ubuntu 22.02
17    1,011,928	370,424	30,674	357,605	13,994	6,958,656  5,871  2023-03-10 - 224 threads (8 thread * 28 instances) eventfd, ThreadSmooting, update use when..then
11    1,039,306	362,739	29,363	354,564	15,748	6,959,479  5,964  2023-03-16 - 224 threads (8*28 eft, ts), update with unnest, binary binding
16   1,046,044	360,576	30,919	352,592	16,509	6,982,578  6,048  2023-03-30 - 224 threads (8*28 eft, ts), modified libpq, header `Server: M`

- tiny (<1%) improved plaintext and json (shorten Server header value)
- +1.5K (~5%) improved rawqueries (and rawupdates as side effect), thank`s to modified libpq

I tries to use update table set .. from values (), () pattern for rawupdates in MR 8128. On my env it's works better than CASE and UNNEST patterns.
Also periodically made some tests by directly modifying libpq to improve db performance, for a while w/o success

mpv · mORMot 1

@radexpol - it's not correct to compare commercial and open source projects. IMHO @ab provides the best support I have ever seen in the open source world. Thousands of questions have been answered in this forum (for free)
@claudneysessa - there is already a link to JavaScript auth example in this answer

mpv · mORMot 2

Yes, I also saw just-js code - it's good. But this is just a proof-of-concept, as author notes. Repository is not maintained for a long time. In last round just-js (as many others, who implement pg by hand) fails because TFB team change PG auth algo from MD5 to something other.
So my guess is to use libpq as much as possible, and implement only a subset of methods and only for raw* tests.

About having a separate pool of DB connections: IMHO this will complicate everything, but I do not sure this gives better results. .net, for example, have a separate DB thread pool, but their results is not better compared to our current implementation.

I almost sure what removing a unneeded `poll` call in libpq gives us very valuable boots.

P.S.
PG auth problem is describedhere - https://github.com/TechEmpower/Framewor … ssues/8061

mpv · mORMot 1

I attach test zip file into https://github.com/synopse/mORMot/pull/444

mpv · mORMot 2

Current round ends

Weights	1.000	1.737	21.745	4.077	68.363	0.163
#	JSON	1-query	20-q   Fortunes Updates Plaintext  Scores
38 	731,119	308,233	19,074	288,432	3,431	2,423,283  3,486  2022-10-26 - 64 thread limitation
43 	320,078	354,421	19,460	322,786	2,757	2,333,124  3,243  2022-11-13 - 112 thread (28CPU*4)	
44 	317,009	359,874	19,303	324,360	1,443	2,180,582  3,138  2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 	563,506	235,378	19,145	246,719	1,440	2,219,248  2,854  2022-12-01 - 112 thread (28CPU*4) CPU affinity	
51 	394,333	285,352	18,688	205,305	1,345	2,216,469  2,586  2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 	859,539	376,786	18,542	349,999	1,434	2,611,307  3,867  2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 	948,354	373,531	18,496	366,488	11,256	2,759,065  4,712  2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 	957,252	392,683	49,339	393,643	22,446	2,709,301  6,293  2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 	963,953	394,036	33,366	393,209	18,353	6,973,762  6,368  2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required,  -O4 optimization
17      915,202	376,813	30,659	350,490	17,051	6,824,917  5,943  2023-03-03 - 168 threads, minor improvements, Ubuntu 22.02
17    1,011,928	370,424	30,674	357,605	13,994	6,958,656  5,871  2023-03-10 - 224 threads (8 thread * 28 instances) eventfd, ThreadSmooting, update use when..then
11    1,039,306	362,739	29,363	354,564	15,748	6,959,479  5,964  2023-03-16 - 224 threads (8*28 eft, ts), update with unnest, binary binding

We are on #11, mostly because many top-rated frameworks fail in this round. Good news what we are VERY close to .net now.

It looks like the next round will be without our latest changes, which caused a lot of discussion.

I found how to improve db-related performance, but such changes requires rewriting a part of libpq on pascal: currently to get a result libpq call poll and then recv. Pull call can be avoided - it used only to implement a timeout. On Linux we can use SO_RCVTIMEO for this. Such changes should improve db round-trip by 10-30%
So my idea is to use libpq for connection establishing, and then operate directly with socket PQSocket. I will do it little by little...

mpv · mORMot 1

Another patch for non-english file names in zip - see mORMot1 MR444. We discover what build-in zip on latest Windows10 Home put such names into Unicode Path Extra Field.

Good to be ported also in mORMot2

mpv · mORMot 2

About libldap it is a long story - first we use Synapse, but there is an TLS problems there, when - libcurl, but it also have a known LDAP issues. So we switch to libldap. At last ldapsearch (what builds on top on libldap) utility is well documented, and our customer can use it to verify their problems. libldap works for us for a long time..

Our latest TFB changes in PR 8057 generated some discussion (as expected)

mpv · mORMot 2

About "with more threads per core, e.g. 16 instead of 8." - we can. But I almost sure this not help, because currently our server uses 100% CPU on rawdb. Let's wait for next round and after try with more threads.

About LDAP - I discover different MS implementations with different Windows Server versions. Also Azure AD (ADFS) has its own nuances
When you get tired of fighting it - here is how I use libldap (mormot1 compatible). I need only ldapbind (use it to verify user password), but I sure it's work in many scenarios. Here is URL's example and some troubleshooting.

mpv · mORMot 2

And a small hack - set a server name to 'M' instead of default "mORMot (linux)". On 7 millions responses it's meter . Just saw that .net calls its server 'K' instead of 'Kestrel' for TFB tests

mpv · mORMot 2

About use one connection for several threads - I don't like this idea (even if it may improve performance), because it's not "realistic" in terms of transaction. In TFB bench we don't need transactions, but in real life - yes. Several thread may want to commit/rollback their own transactions (in parallel) and this is impossible in single connection.

In fact, I still don't understand why our /rawdb result is twice lower compared to top "async" frameworks. This is abnormally. DB server CPU load is ~70% for our test in this case, so bottleneck is on our side. But there? In libpq select call`s? Or because all our threads are busy? No answer yet, but solving this problem is for sure a way to top10. I sure we can solve it without going to "async"

mpv · mORMot 2

TFB's requirement to call Sync() after each pipeline step is highly debatable. Moreover - this discussion was started by the .net team, and they may have their reasons for doing so (after all, MS sponsors citrine hardware, so they can do it).
My opinion is what we do not require sync() at all, the same opinion have postgres tests authors on their test_nosync.

So I decide not to do a PR to libpq but use a modified version (I place it on github). New TFB#8057 is ready. Based on latest mORMot sources.

About current state - mORMot results for round 2023-03-16 is ready, but seams all frameworks results are lower in this round, se we can`t ensure binary bindings helps. We move one place up in composite score.

mpv · mORMot 2

Good news - I found a way to improve PG pipelining performance (rawqueries, rawupdates)

libpq PQPipelineSync function flush socket for each call (do write syscall).

I trace justjs implementation and observe whey send all pipeline commands in one write syscall. After this I rebuild libpq with commented flush and all works correctly and performance increased +4k(10%) RPS for rawqueries and +1k for rawupdates. (I'm using local Postgres, it should increase more over the network). This should add ~150 composite scores

Now I either implement PQPipelineSync in Pascal (need access to internal libpq structures) or, if I can`t, add a modified pibpq into docker file

mpv · mORMot 2

Our test is used by TFB team to verify their CI, just because we builds quickly
See https://github.com/TechEmpower/Framewor … 1467058544

mpv · mORMot 2

It's strange. Another 3 run on latest sources, and everything is OK. May it was be my PC problem...

I made a TFB PR 8031:
- for /updates with count <=15 using 'case .. when .. then' pattern, for count > 15 - 'unnest' pattern
- using binary parameter binding format for Int4/Int8 parameter types - should be faster than textual

mpv · mORMot 2

Upss. @ab - with latest sources TFB tests fails (server crashed) with message "mormot: double free or corruption (!prev)"
Do your regression tests passed?

Updates - crashes on commit "fixed TSqlDBPostgresStatement.GetPipelineResult logging"

mpv · mORMot 2

ab wrote:

@mpv
Please try https://github.com/synopse/mORMot2/commit/2ae346fe about TSqlDBPostgresStatement.GetPipelineResult logging.

It's a little strange. Was this the intention to have q= on second result retrieving?

17:40:07.957 28 DB      mormot.db.sql.postgres.TSqlDBPostgresStatement(7fb9c80013c0) Prepare t=2.98ms c=01 q=select id,randomNumber from World where id=?
...
17:40:07.958 28 SQL     mormot.db.sql.postgres.TSqlDBPostgresStatement(7fb9c80013c0) Execute t=2.99ms c=01 q=select id,randomNumber from World where id=7991
17:40:07.958 28 SQL     mormot.db.sql.postgres.TSqlDBPostgresStatement(7fb9c80013c0) Execute t=2.99ms c=01 q=select id,randomNumber from World where id=4057

17:40:07.958 28 Result  mormot.db.sql.postgres.TSqlDBPostgresStatement(7fb9c80013c0) Execute t=3.15ms c=01 r=1 q=select id,randomNumber from World where id=4057
17:40:07.958 28 Result  mormot.db.sql.postgres.TSqlDBPostgresStatement(7fb9c80013c0) Execute t=3.24ms c=01 r=1 q=
17:40:07.958 28 Result  mormot.db.sql.postgres.TSqlDBPostgresStatement(7fb9c80013c0) Execute t=3.25ms c=01 r=1 q=
...

mpv · mORMot 2

About /rawupdates - on last round with when..than we got only 4k RPS (13K is an ORM /updates result). I don't have any ideas why on TFB hardware /updates results are so strange and do not match my tests, but in fact unnest is better for mORMot

mpv · mORMot 2

@ttomas - I really-really verify different threads count. We work with PG in blocking mode, so need at last connections = CPUx3 (yes, some of them will be idle periodically). Take a look into first 3 rows of results table 3 post above: for /db 3d row with 140 connections is better then 2d with 112 connections, and 2d better than 1t with 64.

@ab - logging problem is only for /rawqueries - see my comment on github

mpv · mORMot 2

I add binary parameter binding - please, see https://github.com/synopse/mORMot2/pull/159
This MR break parameters logging, since I rewrite a p^.VInt64 value to be in htonl byte order as required by PG binary protocol.

Can we extend a TSqlDBParam by adding VInt64BE? because comment below say don't to extend....

// - don't change this structure, since it will be serialized as binary
  // for TSqlDBProxyConnectionCommandExecute
  TSqlDBParam = packed record

This speed up a little all endpoints what retrieve one row (for both ORM and raw level). May be on TFB environment the increase will be more significant.

For /rawupdates on my server there is almost no difference between unnest / when..then / when..then+bin_bind (I always got +-25k). So I can`t explain why TFB /rawupdates is so pure.
Structure for bind array in binary format is undocumented, quite complex and uses x4 more traffic than string representation we currently use, so I do not see any reason to implement it for binding binary arrays into unnest (see array representation here https://stackoverflow.com/a/66499392)
My propose is to use when..than for count <=10 and fallback to unnest otherwise - this gives the best speed for any parameters count.

mpv · mORMot 2

Weights	1.000	1.737	21.745	4.077	68.363	0.163
#	JSON	1-query	20-q   Fortunes Updates Plaintext  Scores
38 	731,119	308,233	19,074	288,432	3,431	2,423,283  3,486  2022-10-26 - 64 thread limitation
43 	320,078	354,421	19,460	322,786	2,757	2,333,124  3,243  2022-11-13 - 112 thread (28CPU*4)	
44 	317,009	359,874	19,303	324,360	1,443	2,180,582  3,138  2022-11-25 - 140 thread (28CPU*5) SQL pipelining
51 	563,506	235,378	19,145	246,719	1,440	2,219,248  2,854  2022-12-01 - 112 thread (28CPU*4) CPU affinity	
51 	394,333	285,352	18,688	205,305	1,345	2,216,469  2,586  2022-12-22 - 112 threads CPU affinity + pthread_mutex
34 	859,539	376,786	18,542	349,999	1,434	2,611,307  3,867  2023-01-10 - 168 threads (28 thread * 6 instances) no affinity
28 	948,354	373,531	18,496	366,488	11,256	2,759,065  4,712  2023-01-27 - 168 threads (28 thread * 6 instances) no hsoThreadSmooting, improved ORM batch updates
16 	957,252	392,683	49,339	393,643	22,446	2,709,301  6,293  2023-02-14 - 168 threads, cmem, inproved PG pipelining
15 	963,953	394,036	33,366	393,209	18,353	6,973,762  6,368  2023-02-21 - 168 threads, improved HTTP pipelining, PG pipelining uses Sync() as required,  -O4 optimization
17      915,202	376,813	30,659	350,490	17,051	6,824,917  5,943  2023-03-03 - 168 threads, minor improvements, Ubuntu 22.02
17    1,011,928	370,424	30,674	357,605	13,994	6,958,656  5,871  2023-03-10 - 224 threads (8 thread * 28 instances) eventfd, ThreadSmooting, update use when..then

Conclusions:
- because of ThreadSmooting scores changed: json +96, db -6, fortunes +28. = +118

- rawupdate with new algo when..than: -4k RPS (-272 scores). But ORM updates improved +3k. The good news here is what I am 95% sure this is because we bind Int params as string. I will implement a binary binding today (retrieve textual results is OK - I verify several times)

- cached-queries improved 148K -> 349K. Mostly because of 8*28 threads + ready-to-be serialiazed TOrm instance

mpv · mORMot 2

As noted in rule vii for cached requrements: Implementations should not create a plain key-value map of objects.
I saw what other frameworks did such, but IMHO this brake the rule, isnt it?

mpv · mORMot 2

Current TFB status:
in round completed 2023-03-10 we move down in composite score #16->#17 (new rust(tokio) viz framework is added), but in endpoints tests we improve our results:
- /json #67 -> #65
- /db #31 -> #25
- /queries #30 -> #29
- /cached #46 -> #38 (we expect huge improvement in next results)
- /fortunes #19 -> #20 (unfortunently)
- /updates #29 -> #25 (some improvements expected in next round with updates w/o unnest)
- /plain #23 -> #21

Some of results improved because of our changes, some - because other frameworks are more affected by new kernel mitigations patches.

Next results expected on 2023-03-15

mpv · mORMot 2

BTW - for /cached-queries TFB uses count=100, my previous measurements were made with count=20. In future I also will measure /cached-queries with 100 values.

mpv · mORMot 2

@ab - a see you adding an LDAP client. Just a note - I switch to libldap (on both Win/Lin) ~5 years ago - this is the only implementation what I fund what works correctly with ADFS. And most enterprises uses ADFS. Do not remember yet what exactly should be done to support LDAP in federation mode - may be additional steps for selecting active LDAP server from pool etc. - libldap solves this out from the box

mpv · mORMot 2

Yes, they updates only OS (Ubuntu 18.04 -> Ubuntu 22.04). Numbers are lowed ~10-12% for all frameworks, I think they do not turn off new all mitigations patches..
In any case we decrease the gap with .NET.

Let's wait for next round with improved thread pool and new updates query. Hope my thread pool size investigations is correct....

mpv · mORMot 2

TFB MR 7994 is ready - based on 46f5360a commit:
- thread pool auto-tuning: use 1 listening socket per logical CPU and 8 working thread per listener socket
- for /updates with count <=20 use 'case .. when .. then' pattern
- [mORMot] update to mORMot 2.0.stable
- [mORMot] improved cached queries performance

Our db layer do not support parameters count > 999, so I add fallback to UNNEST for /rawupdates (TFB tests uses count=500 for verification = 1500 parameters in query) - will do MR to mORMot ex/ later...

Cached queries performance increased 788k -> 923k for 8th*16servers mode

mpv · mORMot 2

Your changes is adopted for TFB PR, see small fixes in pull/153. Will made a pull request to FTB (based on mORMot2.0 stable ) after we got a results for current run (they can stop it for tuning server, hope our results will be ready before this happens). Many thanks for statics.tgz!

I fund a Thread-safe lock-free multiple producer multiple consumer queue implementation by @BeRo1985
@ab - have you seen it? May be it will be better for waking threads than eventfd/RTLEvent?

mpv · mORMot 2

Congratulations with release and BIG thanks for your hard work!

mpv · mORMot 2

Results for 16-16-16 CPU is on google drive

The best mode is 8-16 mode - 8 thread server * 1 server per CPU;
Mode I select for TFB (16 4 in this case 28 6 for their HW) is one of the worst "(

threadSmoothong aviability and/or eventfd vs RLTEvent results are very close;
In all tests app server consumer ~98% CPUs

@ab - take a look into cached-queries x2 difference for many threads per server

mpv · mORMot 2

I rethink a way how I run a test on server hardware (2socket * 12core*2thread = 48 logical CPU).

All my prev. tests are executed on the same server with:
- app. server limited to first 28 CPU (taskset -c 0-28 ./raw ...) to match a TFB hardware logical CPU count
- wrk limited to last 20 cores
- Postgres w\o CPU limits

This is totally wrong for several reasons:
- the good network is local, but we can't do anything with this. TFB uses 10G switch, so we can expect this is near the same as local network
- the badPostgres is not limited to CPUs - this is bad and we got unexpected results for db-related tests (out numbers not match TFB, for example I got better /rawqueries)
- the uglyapp server uses CPUs from different socket - this is ugly, for sure

So I decide to set CPU limits to 16 cores for each part - app 0-15 : Postgres 16-31 (systemctl set-property postgresql.service AllowedCPUs=15-31 + restart) and wrk 32-48 and repeat all tests. The numbers is lower, but proportionally should be more close to TFB HW
Will publish numbers soon..

mpv · mORMot 2

Replace into cause is not applicable for Postgres, this is mysql syntax. Postgrs statement should looks like insert.. on conflict do..
Try to remove boInsertOrReplace option

mpv · mORMot 2

For /rawupdates - where is really no benefit, at last on my server.
I think our problem is not in SQL clause, but in concurrency. Other better ranking frameworks are async and creates connection count = CPU count. We need at last connections=CPU*3. Hope with eventFD I decrease connections from current CPU*6 to CPU*3 without performance lost. And also apply new rawupdate - just to verify..

I will verify latest changes today evening (server I use for tests is someone's production and additional load exists during workday). But the code is much cleaner now, for sure.

Abut io_uring - let's wait while some framework implements it and see a numbers....

mpv · mORMot 2

About ntex - seams they generates update when $1 then $2 when $3 then $4 where id in ($5, $6) as in my version...

mpv · mORMot 2

Fixed /rawupdates - https://github.com/synopse/mORMot2/pull/152, but, since performance is the same, I propose to revert back to sub-query as more realistic

mpv · mORMot 2

Configuring host for such heavy bench-marking as TFB is a separate issue. I recommend:
- kernel - turn off mitigations patches on kernel level (by adding mitigations=off into /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT)
- for docker- disable a userland proxy (by adding "userland-proxy": false into /etc/docker/deamon.json) @ttomas - try this instead of --net
- do not use any VM - on VM results are always unpredictable
- increase max open files limits

I tries with hsoThreadSmooting - for /json result is better - 1450K vs 1300K w/o, for /plaintext - 4220K vs 4200K w/o, for /rawdb and /rawfortunes - near the same (but not slower), queries and updates are a little <1K slower)

About updates SQL - your version not works for Postgres (syndb layer expect ? as a parameter placeholder). When I fix it to generate "when ? then ? ... where id in (?....)" and binds 60 parameters instead of 40 performance is not changed (+-100) compared to array-binded version. Sorting of words array dose not change anything

mORMot Open Source

#101 Re: mORMot 2 » High-Performance Frameworks » 2023-04-24 09:50:25

#102 Re: mORMot 2 » High-Performance Frameworks » 2023-04-23 18:52:56

#103 Re: mORMot 2 » High-Performance Frameworks » 2023-04-20 08:11:49

#104 Re: mORMot 2 » High-Performance Frameworks » 2023-04-18 19:19:04

#105 Re: mORMot 2 » High-Performance Frameworks » 2023-04-17 15:42:21

#106 Re: mORMot 2 » High-Performance Frameworks » 2023-04-16 18:10:06

#107 Re: mORMot 2 » High-Performance Frameworks » 2023-04-14 19:06:39

#108 Re: mORMot 2 » High-Performance Frameworks » 2023-04-14 09:42:19

#109 Re: mORMot 2 » High-Performance Frameworks » 2023-04-13 20:51:39

#110 Re: mORMot 2 » High-Performance Frameworks » 2023-04-13 19:13:37

#111 Re: mORMot 2 » High-Performance Frameworks » 2023-04-11 16:38:51

#112 Re: mORMot 2 » High-Performance Frameworks » 2023-04-10 18:05:30

#113 Re: mORMot 2 » High-Performance Frameworks » 2023-04-10 15:48:57

#114 Re: mORMot 2 » High-Performance Frameworks » 2023-04-07 15:46:57

#115 Re: mORMot 2 » High-Performance Frameworks » 2023-04-06 17:48:31

#116 Re: mORMot 2 » High-Performance Frameworks » 2023-04-05 11:01:09

#117 Re: mORMot 1 » WEB authentication using MORMOT » 2023-03-31 09:06:11

#118 Re: mORMot 2 » High-Performance Frameworks » 2023-03-24 08:22:58

#119 Re: mORMot 1 » SynZip with UTF8 file names inside archive » 2023-03-23 13:14:16

#120 Re: mORMot 2 » High-Performance Frameworks » 2023-03-23 12:54:39

#121 Re: mORMot 1 » SynZip with UTF8 file names inside archive » 2023-03-22 18:56:52

#122 Re: mORMot 2 » High-Performance Frameworks » 2023-03-22 09:56:22

#123 Re: mORMot 2 » High-Performance Frameworks » 2023-03-21 19:10:00

#124 Re: mORMot 2 » High-Performance Frameworks » 2023-03-21 17:14:07

#125 Re: mORMot 2 » High-Performance Frameworks » 2023-03-21 16:51:53

#126 Re: mORMot 2 » High-Performance Frameworks » 2023-03-21 16:28:55

#127 Re: mORMot 2 » High-Performance Frameworks » 2023-03-17 12:05:12

#128 Re: mORMot 2 » High-Performance Frameworks » 2023-03-14 21:51:12

#129 Re: mORMot 2 » High-Performance Frameworks » 2023-03-14 17:20:03

#130 Re: mORMot 2 » High-Performance Frameworks » 2023-03-14 16:02:07

#131 Re: mORMot 2 » High-Performance Frameworks » 2023-03-14 15:46:21

#132 Re: mORMot 2 » High-Performance Frameworks » 2023-03-14 15:35:44

#133 Re: mORMot 2 » High-Performance Frameworks » 2023-03-14 15:25:28

#134 Re: mORMot 2 » High-Performance Frameworks » 2023-03-14 12:47:43

#135 Re: mORMot 2 » High-Performance Frameworks » 2023-03-14 06:53:28

#136 Re: mORMot 2 » High-Performance Frameworks » 2023-03-13 13:50:24

#137 Re: mORMot 2 » High-Performance Frameworks » 2023-03-11 16:37:59

#138 Re: mORMot 2 » High-Performance Frameworks » 2023-03-09 16:29:21

#139 Re: mORMot 2 » mORMot 2 Release Candidate » 2023-03-09 11:12:15

#140 Re: mORMot 2 » High-Performance Frameworks » 2023-03-08 21:35:18

#141 Re: mORMot 2 » High-Performance Frameworks » 2023-03-08 08:27:12

#142 Re: mORMot 2 » High-Performance Frameworks » 2023-03-06 19:51:49

#143 Re: mORMot 2 » mORMot 2 Release Candidate » 2023-03-06 15:38:23

#144 Re: mORMot 2 » High-Performance Frameworks » 2023-03-05 21:27:37

#145 Re: mORMot 2 » High-Performance Frameworks » 2023-03-05 18:59:32

#146 Re: mORMot 2 » can not insert a batch in postgres » 2023-03-04 14:09:46

#147 Re: mORMot 2 » High-Performance Frameworks » 2023-03-03 10:45:08

#148 Re: mORMot 2 » High-Performance Frameworks » 2023-03-02 18:57:41

#149 Re: mORMot 2 » High-Performance Frameworks » 2023-03-02 18:49:03

#150 Re: mORMot 2 » High-Performance Frameworks » 2023-03-02 18:42:25

Board footer