You are not logged in.
Good idea
For my part, I'll try to use `mimalloc` in the next run - the top rust frameworks `ntex` (made by MS, by the way), `may-minihttp`, `xitca-web` use mimalloc. On my computer, mimalloc works a little worse, but on FTB the situation may change
Offline
From https://tfb-status.techempower.com :
Update 1.10.2024: We have 3 new servers on the way (and will be sending back the current servers). Will have the upgraded specs posted shortly. It may take some time to set up the new environment. We appreciate your patience!
This is good news!
We will have some new HW to play with.
@pavel Hope you are not too bad. We think daily about you, your family and your country.
Offline
@pavel Hope you are not too bad. We think daily about you, your family and your country.
True, hope everything is as OK as it just can be...
Offline
It is not easy here, in Ukraine, but this winter is so far easier than the previous one - we were ready for the russian terror.
As for TFB, let's look at the new HW. It's a shame that our last commit didn't play on the old HW - we don't have a base for comparison. After one run on the new HW, I plan to switch our test to mormot@2.2 (and possibly change threads/servers). Life goes on..
Offline
After almost a year of discussions our idea for PostgreSQL pipe-lining mode improvement is merged into PostgreSQL upstream. After Postgres 17 release on September 2024 we will switch to it - for a while I do not want to increase a container build time by adding a libpq compilation from sources
Offline
There is update with new TFB environment https://github.com/TechEmpower/Framewor … 1973835104
56 logical CPU and 40Gb network.
Still waiting for test to be resumed....
Offline
Not my aunt Suzy's computer configuration...
Even for a corporation, it is a pretty huge and unusual setup, especially the network part.
Only the SSD is a weird chose: a SATA version for benchmarking database process? In 2004, why not, but in 2024? Really?
Offline
Have Mormot2 not taken the test recently?
Offline
The TFB benchmarks are not properly running.
After 'martini' (no joke), all tests were aborted by their script.
https://tfb-status.techempower.com/resu … 09bab0d538
They have configuration issues on their side.
But the people behind the TFB seem not very good at IT work and they have difficulties about their setup.
They received new HW, but seem still to fight against it up to now.
Offline
The TFB run on the new hardware just reached mORMot.
And we are not badly ranked:
https://www.techempower.com/benchmarks/ … =composite
25,000,000 pipelined requests per seconds for the /plaintext test, and 3,000,000 RPS for /json - nice!
Offline
Congratulations!
Offline
Currently, with 56 CPUs, mormot [async,nopin] (1 process, 112 threads, no pinning) is good for updates, but bad for other scenarios (compared to async with pinning and direct pinning).
Let's wait for the next launch with the latest PR (the current launch is based on the same sources as Round22).
The bad thing is that we don't have a test server that matches the TFB configuration now, so we can't test different thread/server/pin combinations to find the best one.
Offline
Hello,
I've been wondering how close mORMot itself is in the TFB compared to Dev version.
Is it synced all the time or feature by feature. Or is some functionality and changes developed over there and merged back etc...
-Tee-
Offline
It sounds like if we will be in the top 10 with this high-end hardware, with the same exact software (rank #6 eventually I guess).
Nice seeing that we went from #12 to #6 just by upgrading the hardware.
@TPrami We use the very same source code version than last year official round 22. It is an old version, and current trunk may be slightly faster.
What we can observe is that Java did not scale as well as Rust and Pascal on this new hardware: the best Java frameworks went behind mORMot.
I find it interesting. Especially for long-term running servers, I would not consider Java as a viable solution for leveraging resources (mainly memory). On high-end hardware, Java has structural bottlenecks which prevent it from scaling.
Just.js is more like an experiment, a very clever one, very fast for JavaScript, but not usable on production. We already saw that.
libh2o claims to be the fastest c library for HTTP networking, and mORMot numbers are very close to it (even better for pipelined /plaintext).
For its DB access, it is no framework, but raw access to the PQlib pipelined API, interleaved with the h2o socket layer. But a good reference code to read https://github.com/TechEmpower/Framewor … database.c perhaps if we want to reuse our async socket layer over the postgresql connection.
It shines on the /fortunes endpoint, but at the expense of very complex low-level code, with manual variables and sections names lookup. Imagine the work for a realistic template.
So, performance aside, the libh2o entry is not comparable to mORMot: it is also an experimental solution.
On the contrary, ntex / may-minihttp / xitca-web Rust frameworks are not so experimental.
They leverage the async nature of Rust (its .await syntax), and code is still readable - especially for Rust.
The framework behind most of those frameworks seems to be https://github.com/tokio-rs/tokio
It sounds like if their http server uses "io-uring" with a modern Linux kernel, but with no dramatic speed improvement for the /json endpoint. Perhaps we won't need to support io_uring in mORMot immediately, if it makes no difference on such high-end HW with our regular epoll-based server.
Offline
One more observation is that may-minihttp, ntex, and xitca-web use mimalloc. I also plan to try to use it in one of the next rounds. I've already tried mimalloc on my server - performance is the same as with libc alocator, but on the new TFB hardware it might change the numbers a bit.
P.S.
#6 is a good place IMHO
Last edited by mpv (2024-04-10 15:21:17)
Offline
Yes, #6 is already great!
Perhaps mimalloc could help a little.
I have just tried to optimized TFB /rawfortunes and /asyncfortunes
- it is where we are the most behind other frameworks (only 66% of the best);
- we won't allocate TFortune.Message string but use a PUtf8Char (avoid some locked refcnt process);
- reuse a per-thread Mustache rendering context (stored within the per-thread connection instance).
https://github.com/synopse/mORMot2/commit/1a71fbf0
Please try it on your side with PostgreSQL.
But I suspect it won't be enough.
The problem seems to be in the DB request itself.
We also are around 66% below the best frameworks with the single query /rawdb endpoint.
Perhaps we could improve both entries, if we can fix the bottleneck of a single SELECT.
It is weird that [mormot-async] is slower than [mormot-direct] and [mormot-orm] for these single SELECT requests.
We may be able to do better. My guess is that TSqlDBPostgresAsyncThread.Execute could be done in a better way.
Offline
Edit:
Maybe https://github.com/synopse/mORMot2/commit/c4c43f03 could help a little.
The TSqlDBPostgresAsyncThread.Execute method was perhaps calling sleep() too much.
I hope [mormot-async] could benefit from this.
Offline
Yes, disabling sleep should help for the asynchronous server, because according to dstat (now it's running) the processor is idle ~66% (only 40% is used). By the way, for a raw, idle is ~44%, which is also too high, for example, in h2o idle is ~20%.
In today's result based on mORMot2.2.stable, our score increased from 18,281 to 19,303, I hope we will be #5.
I'll test fresh sources on my computer (my test server is in a region with power outages and is unavailable) and do a MR (hope today)
Offline
@ab - it seems escaping is broken in latest Mustache implementation
Valid fortune resp should be (<script>)
<tr><td>11</td><td><script>alert("This should not be displayed in a browser alert box.");</script></td></tr>
but current implementation do not escape {{ }} template and result is
<tr><td>11</td><td><script>alert("This should not be displayed in a browser alert box.");</script></td></tr>
Last edited by mpv (2024-04-15 18:25:19)
Offline
Oups...
The logic was inverted in https://github.com/synopse/mORMot2/comm … 39b7438856
Please try with https://github.com/synopse/mORMot2/commit/2ada2b4b
BTW I am not sure than mimalloc could make a real difference with the latest version of /rawfortunes, because there is almost no memory allocation any more with my latest version of https://github.com/synopse/mORMot2/commit/1a71fbf01a770
Offline
Verified with 2.2.7351 - now escaping is OK. TFB MR #8883 is ready.
I don't bother with mimalloc either - we'll see how it goes with the current improvements.
Offline
The last MR gives better results.
Not changing the rank, but higher numbers, especially for the unique SELECT runs (we are now 70% of the best instead of 60%).
https://www.techempower.com/benchmarks/ … =composite
Perhaps we could try another MR including some last commits:
https://github.com/synopse/mORMot2/commit/72934e6609 (avoid memory alloc of the TTextWriter)
and
https://github.com/synopse/mORMot2/commit/15e284e9b05 (small optimization of raw.dpr)
and perhaps also
https://github.com/synopse/mORMot2/commit/d312c00d (faster random world id)
If memory is still a bottleneck, TTextWriter reuse could help a little more, for all entries but /fortunes.
And using our own TLecuyer instance avoid a somewhat slow access to a threadvar.
@mpv question #1
BTW, I have seen that some frameworks use a small parameter name for the queries, e.g. ?q= in https://github.com/TechEmpower/Framewor … ain.rs#L18
Perhaps we could also use this trick, if it is allowed, and always use a fixed search = 'Q=' parameter for GetQueriesParamValue(ctxt)?
function GetQueriesParamValue(ctxt: THttpServerRequest): cardinal; inline;
begin
if not ctxt.UrlParam('Q=', result) or
...
and change test URIs to use ?q= encoded parameter, instead of ?queries= and ?count=... with
"query_url": "/queries?q=",
"fortune_url": "/fortunes",
"update_url": "/updates?q=",
"plaintext_url": "/plaintext",
"cached_query_url": "/cached-queries?q=",
in the config file.
@mpv question #2
It is interesting also how connection pools are implemented in may-minihttp.
They allow a connection pool of 1000 instances, and affect one to each connection, using a modulo of the connection ID (sequence I guess).
It may be a better way that our per-thread connection pool, for single queries... and it may help not to fork the executable.
IIRC the max number of connections for DB benchmarking is up to 512 concurrent clients, so each HTTP client connection would have its own DB access. Only /plaintext scales up to 16384 clients, but with no DB involved.
Offline
I'll do MR with the latest changes..
As for `q=`, I tested this case a year ago and it gives nothing. I'm afraid that even if we do this, we will get a ton of criticism, just like with Server-Name. I'll make it a separate MR (after MR with a latest changes)
About `using a modulo of the connection ID` - but what if we have 1001 client - we can`t use one connection for both 1 and 10001 . As far as I remember I test an implementation with per-worker connection (call ThreadSafeConnection once and memorize it into worker context) and performance is near the same as with ThreadSafeConnection
BWT currently our raw server creates 448 DB connections (num servers=56, threads per server=8, total threads=448, total CPU=56, accessible CPU=56, pinned=TRUE, db=PostgreSQL), maybe I'll increase `threads per server' to 10 to get 560 connections, so each concurrent client will have its own - that might work (after MR with q=)
Last edited by mpv (2024-04-23 14:58:51)
Offline
@mpv
Makes sense!
So we will try with the latest changes in the next round, and I won't investigate any further about the connections pool. Perhaps threads per server to 10 may help, since all tests are up to 512 connections.
Note that the ./async endpoints create an additional connection per existing connection: so it would create 1024 connections in all - I don't know if it won't be too much.
Offline
56*(8-1)=392 db connections
Offline
Looking at Single query-Data table, async servers have worse result then ORM!
[async,nopin] mormot-postgres-async2 use 1 server, 56*2 threads (56*2-1=111 db connections)
all other tests use (56*(8-1)=392 db connections)
Not fair for [async,nopin]!
Edited:
Looking at Data updates-Data table, for multiple updates 10,15,20, [async,nopin] benefit from lower db connections
Last edited by ttomas (2024-04-24 12:11:09)
Offline
@ttomas - I'll increase threads 2 -> 4 for [async,nopin]
@ab - Please, fix raw.pas, because it do not compiles (TRawAsyncServer.ComputeRandomWorld is not accessible from inside TAsyncWorld.DoUpdates / Queries) - I'll follow your fix and made an MR
Offline
@mpv
Try with https://github.com/synopse/mORMot2/commit/7b76af41
Offline
Now it compiles, all test except update passed, but for /update (and /rawpdate) ftb validation are not passed.
FAIL for http://tfb-server:8080/updates?queries=501
Only 1 items were updated in the database out of roughly 500 expected.
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
PASS for http://tfb-server:8080/updates?queries=20
Executed queries: 10752/10752
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
PASS for http://tfb-server:8080/updates?queries=20
Rows read: 10635/10240
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
FAIL for http://tfb-server:8080/updates?queries=20
Only 506 rows updated in the database out of roughly 10240 expected
@ab - can you imagine what could have happened? The methods themselves have not changed... (Sorry, I'm not able to debug at the moment, only run tests in docker)
Offline
I guess this is because of collision when the TLecuyer random generator is shared between the threads.
Should be fine now with
https://github.com/synopse/mORMot2/commit/4f100299
Offline
Sorry for delay. Now crashes on TRawAsyncServer.rawqueries because uninitialized PLecuer is passed to GetRawRandomWorlds.
@ab - maybe we should go back to https://github.com/synopse/mORMot2/commit/d312c00d ? because these lecturers who are now everywhere have made the code unreadable (and not necessarily faster)
Offline
Offline
Previous run was finished.
We went higher, so we are #6 now, over redkale.
https://www.techempower.com/benchmarks/ … =composite
Sadly, the pending pull/modification requests have not been integrated to the new run.
https://github.com/TechEmpower/Framewor … arks/pulls
We will be able to see the stability of their HW installation.
Numbers should stay the same with no software update.
Offline