You are not logged in.
Pages: 1
Follwing this post about the thread-safe of TRestServerDB, I did some test about multi-threaded BatchAdd.
The database is PostgreSQL, and I use TSQLDBZEOSConnectionProperties
Test 1: I use TRestServerDB.BatchAdd to insert 150,000 records into one table in a single thread
Test 2: I use TRestServerDB.BatchAdd to insert 50,000 records in the the same table, but from three different concurrent threads
Both Test 1, and Test 2 uses the SAME instance of TRestServerDB.
Test1: is quite fast, achieving a rate of 100,000 records/second
Test2: very slow, the overall rate is 5000 records/second
Is this expected? Any tricks to improve the multi-threaded Batch insert?
Note - If I split the data into three different tables (i.e., the same table structure but diff table names) - the results are still similar. Multi-threaded insert is very slow.
Last edited by wxinix (2021-09-23 03:58:12)
Offline
The ORM DB access is protected with a mutex, so it is serialized.
The three threads won't work in parallel, but one after the other.
1. What if you insert 150,000 records from three concurrent threads?
2. What if you insert 50,000 records from one thread, but 3 times in a row?
3. What if you use our direct PostgreSQL layer instead of Zeos?
4. What if you use SQLite3 instead of PostgreSQL?
The main point is test 2. My guess is that when the 2nd or 3rd Batchs of 50,000 are executed, they are waiting for the data of the first batch to be fetch to the disk, so PostgreSQL delays the write.
It may have nothing to do with multi-threading, but with sending a lot of data to PostgreSQL.
Please enable the logs and find out the proper timing. It will give you clues about what happens.
As a trick, for best performance what I usually do is use a shared (and secured with a lock) TSQLRestBatch instance between threads, then use a background thread to push all the pending data at once, e.g. every few seconds.
The inserts will be delayed, but the insertion will be instant from the requests.
Don't forget to flush the bending data on server shutdown. Obvious drawbacks is that the DB does not instantly contain the new data - but in most use-cases of batch, it doesn't matter much (e.g. if you use it the ORM class as a journal).
Online
Note - all the insertion cases are targeting ONE single table. For each test case, the table will be emptied in advance.
----------------------------
1. What if you insert 150,000 records from three concurrent threads?
It doesn't change the overall average insert rate
2. What if you insert 50,000 records from one thread, but 3 times in a row?
Similar (slightly slower 1~2%) insertion rate, compared to one-time 150,000 records.
3. What if you use our direct PostgreSQL layer instead of Zeos?
mORMot PostgreSQL layer IMPROVES the multi-threaded (3 threads) insertion rate by x6 times, increasing from Zeo's 5000 records/sec to a much higher 30000 records/sec
But, PostgreSQL layer DROPS the single-threaded insertion rate from Zeo's 100,000 record/sec to about 50,000 records/sec
4. What if you use SQLite3 instead of PostgreSQL?
If I use mORMOt internal in-mem SQLite3, the multi-threaded insertions is about 4~6% FASTER then the single-thread insertion, around 220,000 records/sec
If I use mORMot static object list, the multi-threaded insertions is about 4~6% FASTER then the singled-thread insertion, around 500,000 records/sec
---------------------------
IN SUMMARY:
- mORMot in-mem SQLite, and static object list: multi-threaded batch insertion rate is FASTER than single-threaded insertion
- for PostgreSQL, Zeos has the best single-threaded batch insertion rate (100,000 records/sec), mORMot Postgresql Layer has the best multi-threaded batch insertion rate (50,000 records/sec).
Are the above making sense?
Last edited by wxinix (2021-09-23 14:50:17)
Offline
Yes, they make sense.
So the performance drop is clearly specific to the PostgreSQL database link.
Probably the way the data is queued on both client and server DB sides.
Note that 30,000 or 50,000 inserts per second is a very good number for a regular SQL database.
What is weird is point 2. Because from one thread, it uses the same connection.
Perhaps you may try to use a single connection for PostgreSQL instead of the default, which is one connection per pool IIRC.
If you really need consistent write to PostgreSQL, I guess my shared TSQLRestBatch trick which is sent in a background thread may be the only solution.
Online
But, PostgreSQL layer DROPS the single-threaded insertion rate from Zeo's 100,000 record/sec to about 50,000 records/sec
I think this depends on data you insert. ZEOS interally uses binary pg protocol, so if inserted data is mostly numbers it can be faster compared to mORMot Postgres layer, where text pg protocol is used
Last edited by mpv (2021-09-23 17:27:17)
Online
wxinix wrote:But, PostgreSQL layer DROPS the single-threaded insertion rate from Zeo's 100,000 record/sec to about 50,000 records/sec
I think this depends on data you insert. ZEOS interally uses binary pg protocol, so if inserted data is mostly numbers it can be faster compared to mORMot Postgres layer, where text pg protocol is used
Thank you for the advice. I'll keep this in mind. But for now - I'll stick to ZEOS, which seems has faster performance for my use case.
Offline
If you really need consistent write to PostgreSQL, I guess my shared TSQLRestBatch trick which is sent in a background thread may be the only solution.
Thank you ab, for the advice.
I implemented as advised - using a background worker thread to do the BatchAdd work, and rerun the test.
Note - I am using ZEOS for PostgreSQL connection.
The results are very satisfying.
- mORMot static object list - 760,000 records/sec
- mORMot SQLite in-memory - 290,000 records/sec
- PostgreSQL - 110,000 records/sec
Should you be interested - the record type is defined as follows:
TSQLRecordSample = class(TSQLRecord)
strict private
FAmount: Currency;
FBirthDate: TDateTime;
FCreatedAt: TCreateTime;
FLastChange: TModTime;
FName: RawUTF8;
published
property Amount: Currency read FAmount write FAmount;
property BirthDate: TDateTime read FBirthDate write FBirthDate;
property CreatedAt: TCreateTime read FCreatedAt write FCreatedAt;
property LastChange: TModTime read FLastChange;
property Name: RawUTF8 index 40 read FName write FName stored AS_UNIQUE;
end;
Last edited by wxinix (2021-09-24 00:23:20)
Offline
I continued with my performance testing - this time, I compared mORMot2 SQLite in memory vs mORMot2 External SQLite File DB. I am very surprised that the External SQLite File DB is much faster than the SQLite in MEM DB - 25% faster!
Note:
- For External SQLite File DB, I used TSQLDBSQLite3ConnectionProperties, and SynchronousMode := Off, and LockingMode := Exclusive
- Delphi 10.4.2, Win64 Build
Details below - but does that make sense? I expect the In-MEM DB faster than External File DB.
- Single Table, Multi-Threaded, BatchAdd via a background worker thread, the SQLite In MEM DB Results:
20210928 18564227 " debug Test_BatchAdd_STable_MThread_MemDB::thr_1 - Orm.Server.BatchSend 6000 records takes 22.54ms (266098 rec/sec), code 200
20210928 18564229 " debug Test_BatchAdd_STable_MThread_MemDB::thr_2 - Orm.Server.BatchSend 6000 records takes 23.09ms (259830 rec/sec), code 200
20210928 18564230 " debug Test_BatchAdd_STable_MThread_MemDB::thr_3 - Orm.Server.BatchSend 6000 records takes 23.42ms (256191 rec/sec), code 200
- Single Table, Multi-Threaded, BatchAdd via a background worker thread, External SQLite File DB, SynchronousMode := Off, LockingModel := Exclusive
20210928 18564257 " debug Test_BatchAdd_STable_MThread_SQLite::thr_1 - Orm.Server.BatchSend 6000 records takes 17.34ms (345861 rec/sec), code 200
20210928 18564258 " debug Test_BatchAdd_STable_MThread_SQLite::thr_3 - Orm.Server.BatchSend 6000 records takes 18.15ms (330578 rec/sec), code 200
20210928 18564259 " debug Test_BatchAdd_STable_MThread_SQLite::thr_2 - Orm.Server.BatchSend 6000 records takes 17.71ms (338676 rec/sec), code 200
Last edited by wxinix (2021-09-28 19:20:48)
Offline
This is not consistent with my previous finding about mORMot BatchAdd and previous benchmarks.
I don't have any idea why you find this.
Perhaps the Delphi Win64 memory manager is not very efficient? What if you try on Win32?
Online
This is not consistent with my previous finding about mORMot BatchAdd and previous benchmarks.
I don't have any idea why you find this.
Perhaps the Delphi Win64 memory manager is not very efficient? What if you try on Win32?
HI ab, I uploaded the testing project to my github gist , you can try and see yourself. In BOTH x86 and Win64 cases, mORMot2 SQLite external file DB is much faster than the SQLite in memory DB.
Delphi 10.4.2, Win64 build, results:
LServer_SQLiteMem.BatchSend 6001 records takes 21.60ms (277785 records/second).
LServer_SQLiteExtFile.BatchSend 6001 records takes 16.84ms (356248 records/second).
Delphi 10.4.2, x86 build, results:
LServer_SQLiteMem.BatchSend 6001 records takes 28.33ms (211809 records/second).
LServer_SQLiteExtFile.BatchSend 6001 records takes 21.48ms (279363 records/second).
Offline
In my pull request https://github.com/synopse/mORMot2/pull/58
I convert sample 15 and results are:
Running tests using Synopse mORMot framework 2.0.1, compiled with Free Pascal 3.2 64 bit, against SQLite 3.36.0, on Manjaro Linux - Linux 5.4.148-1-MANJARO, at 2021-09-30 01:44:36.
Insertion speed (rows/second)
Direct Batch Trans Batch Trans
Sqlite file full 304 15413 29067 82673
Sqlite file off 4027 69256 35187 90535
Sqlite file off exc 12478 82022 35143 90420
Sqlite file off exc aes 8005 73736 35028 88832
Sqlite in memory 28583 90438 35340 91195
In memory static 82611 183708 85679 183769
In memory virtual 80202 170404 84334 170010
External sqlite file full 305 7638 31593 99072
External sqlite file off 5352 67390 39135 106364
External sqlite file off exc 17633 92354 38797 105929
External sqlite in memory 32937 107940 38307 106935
Remote sqlite socket 5587 49019 5781 42242
Read speed (rows/second)
By one All Virtual All Direct
Sqlite file full 24912 281563 275360
Sqlite file off 25301 289184 290528
Sqlite file off exc 42320 292620 293083
Sqlite file off exc aes 42133 296155 298169
Sqlite in memory 42353 289502 300300
In memory static 91441 551876 554692
In memory virtual 90920 211855 212539
External sqlite file full 50652 130589 286057
External sqlite file off 50972 131665 286368
External sqlite file off exc 51019 131585 286763
External sqlite in memory 49759 130938 286944
Remote sqlite socket 6518 117181 227489
Offline
@wxinix
In fact, you didn't compare plain SQLite in-memory and SQlite on file performance.
You did compare direct SQlite in-memory and external SQlite3 file performance.
And the external SQL Batch process is more optimized about the generated SQL, IIRC.
This explain the difference, as clearly shown by the Sample 15 numbers reported by @ttomas.
Note that numbers are consistent with mORMot 1 as can be seen on https://gist.github.com/synopse/13c2cdd … 2cb9d9f316
Edit: I checked TSqlDBConnectionProperties.MultipleValuesInsert and TRestOrmServerDB.InternalBatchStop.
And in fact they generate the same multi-insert SQL as " INSERT .. VALUES (..),(..),(..),..... ".
So I am not sure where the performance difference lies.
I just committed some code refactoring which enhanced performance on my side. Please check and see if it is better on your side.
Online
@wxinix
I just committed some code refactoring which enhanced performance on my side. Please check and see if it is better on your side.
HI @ab, @ttomas
Thank you for the advice and feedback.
I built with the latest commit - still it seems SQLite External File provides the best BatchAdd performance (~350k records/second).
There got to be some (nice) reason for such a stunning performance for an SQL DB. Regardless, my goal is to find out which settings provides the best WRITE performance- given the testing results, the answer appears to be SQLite External File .
Note - SQLite External File also gives very good READ performance (slightly worse than SQLite in Memory). So overall, SQLite External File appears to be the best option?
Any comments? Insights or advice?
Last edited by wxinix (2021-09-30 13:31:48)
Offline
For what I just found, perhaps some hints about the performance difference:
1) Direct SQLite3 uses inlined parameters, so a bit more processing.
2) Direct SQLite3 features a JSON cache.
I have just by-passed the parameters inlining for TRestOrmServerDB.MainEngineRetrieve.
So 1) should affect less the "By one" performance.
The 2) JSON cache reduces a bit the performance for our benchmarks.
But in real live, on real code, it is very efficient. In practice, I see a lot of "from cache" entries in production logs. So performance would be higher with direct SQLite3 process.
For the "By one", I have just by-passed the cache, since we have a per-ID cache at higher level.
In production, I would use the internal Sqlite3 engine for sure.
Especially with my latest commits.
Online
In production, I would use the internal Sqlite3 engine for sure.
Especially with my latest commits.
Hi @ab thank you for the latest commit, and advice.
In case of a real-time time-series database where WRITES (BatchAdd, every couple of seconds) are much more frequent than READS (every couple of minutes) - is the internal SQLite engine's cache still working? I think the cache would be less effective?
Thanks again.
Offline
I merged your sample.
But I am afraid it is broken in its use of SQLite3 modes.
So the numbers are not accurate, and pretty not relevant about external tables.
I will fix it.
Note that I would always use direct SQLite3 on production.
Online
I have fixed the sample.
And also rewritten some part of the ORM process for better performance.
Now the direct SQLite3 layer seems faster than the external SQLite3 layer for both reads and writes.
Running tests using Synopse mORMot framework 2.0.1, compiled with Free Pascal 3.2 64 bit, against SQLite 3.36.0, on Debian GNU/Linux 11 (bullseye) - Linux 5.10.0-8-amd64, at 2021-10-01 17:48:59.
Insertion speed (rows/second)
Direct Batch Trans Batch Trans
Sqlite file full 115 6319 64747 174794
Sqlite file off 12759 277631 117895 405383
Sqlite file off exc 37738 356862 122970 418900
Sqlite file off exc aes 24869 310375 120186 411742
Sqlite in memory 100126 426766 122667 430811
In memory static 253074 564620 273231 576435
In memory virtual 259591 558846 270679 545256
External sqlite file full 120 6816 74253 171485
External sqlite file off 15620 260851 122155 352522
External sqlite file off exc 51920 322424 121963 364219
External sqlite in memory 71029 384157 123580 425061
Remote sqlite socket 19477 183853 20045 166530
Read speed (rows/second)
By one All Virtual All Direct
Sqlite file full 93625 919371 921319
Sqlite file off 96372 898149 899847
Sqlite file off exc 179746 881756 897746
Sqlite file off exc aes 175734 887075 893734
Sqlite in memory 181662 912575 919202
In memory static 282023 1648532 1660026
In memory virtual 284212 619732 643956
External sqlite file full 152685 407199 885425
External sqlite file off 153855 400849 871459
External sqlite file off exc 151356 406785 639427
External sqlite in memory 172878 852006 885425
Remote sqlite socket 20477 352683 695410
Note that "Sqlite file off exc" is the right mode to use on production: both safe and fast.
The "exclusive" lock is indeed a feature on server side: it ensures no one tempers with the database while the mORMot service is using it.
Online
I have fixed the sample.
And also rewritten some part of the ORM process for better performance.
Now the direct SQLite3 layer seems faster than the external SQLite3 layer for both reads and writes.
[
I rerun my test with the latest commit - quite contrary, it seems external SQLite3 layer is still much faster than direct SQLite3 layer for writes. I am using Delphi 10.4.2, win64.
I noticed that in your sample code, https://github.com/synopse/mORMot2/blob … tCases.pas, TRestClientDB class is used, like below:
TRestClientDB.Client.BatchStart
....
TRestClientDB.Client.BatchAdd
...
TRestClientDB.Client.BatchSend
But in my testing code, I use TRestServerDB class directly, together with TSQLRestBatch class:
LBatch := TSQLRestBatch.Create
...
LBatch.Add
...
TRestServerDB.BatchSend(LBatch...
My question is - is the latest performance improvements only for TRestClientDB? It seems TRestServerDB.BatchSend performance is not improved with the latest commits.
Last edited by wxinix (2021-10-01 20:40:24)
Offline
Both are the same: a local LBatch or the TRestBatch as part of TRestClient.
About the numbers:
Delphi Win32 numbers: https://gist.github.com/synopse/3803d3c … 031b38c84a
Delphi Win64 numbers: https://gist.github.com/synopse/ed99718 … 3a4e0566f4
FPC Win64 numbers: https://gist.github.com/synopse/82d353e … 017d12c94d
The fastest is FPC. Especially for the AES encryption which is better on FPC due to more tuned asm (Delphi asm is behind). And the absolute best numbers are on FPC + Linux x86_64 (see above numbers).
But with all compilers, I found out the internal SQLite3 to be (slightly) faster than external SQlite3.
I don't know why your Delphi Win64 is less good for the internal SQLite3 on your computer.
The external DB layer is very thin: most of the time is spent inside the SQlite3 engine itself. Our ORM is able to bypass the whole "external" layer. So it makes sense to have close numbers.
Online
It is not "Delphi Win64 is less good for internal SQLite 3 on my computer". Actually, I did get the similar results using Delphi 10.4.2, x64, when running the official test project https://github.com/synopse/mORMot2/blob … onsole.dpr . Such results indicate Direct (internal) SQLite layer is faster then External SQLite layer, as you noted.
However, my updated test project https://gist.github.com/wxinix/325c64a2 … 4ae02ff75f indicates otherwise - that is, External SQLite layer is faster than Direct SQLite layer.
I am happy with the good-enough performance either way. Just not sure what caused the difference. If you could take a quick peek, that would be great. But if you are not interested, that is also fine. Thanks again for you feedback and the continued performance improvements.
I don't know why your Delphi Win64 is less good for the internal SQLite3 on your computer.
The external DB layer is very thin: most of the time is spent inside the SQlite3 engine itself. Our ORM is able to bypass the whole "external" layer. So it makes sense to have close numbers.
Offline
I have introduced a new BATCH JSON layout with "SIMPLID" method and detection of redundant methods.
This new format would make the client not compatible any more with mORMot 1 but there is an option to disable it.
SIMPLE and SIMPLID are able to directly write into SQLite3 with almost no memory allocation.
The result is outstanding: it gives a huge performance boost - more than two times faster on direct SQLite3 Batch+Transaction insertions.
Running tests using Synopse mORMot framework 2.0.1, compiled with Free Pascal 3.2 64 bit, against SQLite 3.36.0, on Debian GNU/Linux 11 (bullseye) - Linux 5.10.0-8-amd64, at 2021-10-05 22:26:52.
Insertion speed (rows/second)
Direct Batch Trans Batch Trans
Sqlite file full 113 8101 73333 239320
Sqlite file off 12978 438365 127753 777302
Sqlite file off exc 38599 623441 126636 800064
Sqlite file off exc aes 25135 491207 127360 787215
Sqlite in memory 105102 852660 128297 856677
In memory static 250991 1189343 275945 1207583
In memory virtual 248348 1139471 263476 1150483
External sqlite file full 116 7263 75706 186313
External sqlite file off 15690 295368 122033 393886
External sqlite file off exc 51204 354660 124870 386922
External sqlite in memory 102617 833889 129157 833750
Remote sqlite socket 20076 158516 21872 192255
Read speed (rows/second)
By one All Virtual All Direct
Sqlite file full 94679 885896 887705
Sqlite file off 98441 900900 903179
Sqlite file off exc 186758 897988 905387
Sqlite file off exc aes 186327 908265 916254
Sqlite in memory 186880 913993 919878
In memory static 290334 1627074 1646361
In memory virtual 286500 644620 641889
External sqlite file full 137204 406867 880281
External sqlite file off 161438 412286 886446
External sqlite file off exc 158760 400304 878117
External sqlite in memory 185870 869111 888572
Remote sqlite socket 21410 363715 714234
Note that these numbers include some in-process client-like JSON marshaling and routing. I also optimized the client Batch generation process.
So here is the corresponding graph for the insertions:
And for the reading:
The performance numbers of mORMot 1 - which were already first in their class - are clearly outperformed with mORMot 2.
We handle more then 1 millions of records per second, for both reading and insertion.
Online
The performance numbers of mORMot 1 - which were already first in their class - are clearly outperformed with mORMot 2.
We handle more then 1 millions of records per second, for both reading and insertion.
Congratulations! @ab
Really, really neat, amazing, and fantastic work! It rocks!!!!
Wow wow wow!
Yes Yes Yes! - this time, it is indeed that the Direct SQLite File/InMem is much faster than External SQLite File/InMem!!!!!!
Super, super, super fasttttttttttttttttttttttttt!
Thank you!!!!!!!!!!!!!!!!!!!!!!!!
Last edited by wxinix (2021-10-05 21:39:16)
Offline
I have enhanced the Batch JSON layout to store the inserted fields as hexadecimal bitset instead of JSON object.
It reduces the transmitted JSON size, and also allow direct SQLite3 process from the JSON buffers.
It won't change our extdb-bench numbers, but it would help to leverage the multi-insert process when only some of the TOrm fields are written.
Online
Amazing work again, ab!
With all these mORMot2 improvements, I just hope it'll not that difficult when upgrading from mORMot1 in the future.
Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.
Offline
Pages: 1