Highest performance when adding a lot of records to Oracle

Bascy · 2012-07-06 13:19:21

We're in the process of comparing different OPFs to see if and which one will be a candidate to the next generation of our application.

I've made a test project based on sample3, that creates 100000 occurences and stores them in the database as follows:

procedure TForm1.btWriteManyClick(Sender: TObject);
var
  i: Integer;
  Starttime, Endtime: cardinal;
  Rec: TSQLSampleRecord;
begin

  Rec := TSQLSampleRecord.Create;

  Rec.Question := QuestionMemo.Text;
  Rec.Address_ := 'Dit is het adres laan, 23';
  Rec.postalCode := '1234AA';
  Rec.City := 'Nijmegen';

  StartTime := GetTickCount;
  try
    for i := 0 to 100000 do
    begin
      Rec.Name := NameEdit.Text + IntToString(i);
      Client.Add(Rec,true);
    end;
    EndTime := GetTickCount;
  finally
    Rec.Free;
  end;

  OutputDebugString(PChar(Format('Elapsed: %d ms', [ endtime - starttime ])));
  Beep;
end;

This way we're getting around 600 object per second.

Is there a way to speed this up?

mpv · 2012-07-06 14:31:59

If you do not use transaction and batch 600 insert per sec. is a good result. ( try same code with TQuery, for example )
If you want speed up your test on Oracle
1) use transaction and commit it for example every 1000 record inserted (number of record inserted in one transaction depend of Oracle configuration)
2) use batch. (BatchAdd instead of Add in mORMot).

this is not mORMot limitation - such things must be done in any architecture.

ab · 2012-07-06 15:33:21

MPV is right, in such case, the BATCH mode is the solution to speed: all data will be transmitted at once to the server.
The remote connection type will also make the difference (in speed order: GDI messages, named pipe, TCP/HTTP).
For best speed using HTTP, disable SynLz compression on client side, or disable your antivirus (packet inspection slows down the process a lot).
Then it will only depend on the Oracle speed.
If the Oracle server is on the same PC than the mORMot server, performance will be good.
As MPV advised, consider regrouping BATCH per 1000 items, for instance. And note that BATCH mode does not create an implicit transaction: you'll have to define your own transaction on Client side.

I would like to add BATCH mode also at SynDB level, i.e. also allow multiple insert for OLEDB and Oracle.
Perhaps in our to-do list if needed.
I added it on the Project RoadMap:

Add BATCH mode also at SynDB level, i.e. also allow very fast multiple inserts for OLEDB and Oracle classes, using advanced features of both providers (SQLite3 is already as fast as possible, when inserted within a transaction)

In the meanwhile, feedback from you about performance would be very nice!
Perhaps just the BATCH mode would perform nicely, without further optimization.
Oracle Client is very efficient, AFAIK.

Bascy · 2012-07-09 09:43:27

Ok, added the Batch and Transaction parts to the code:

procedure TForm1.btWrite1MilionClick(Sender: TObject);
var
  i: Integer;
  Starttime, Endtime: cardinal;
  Rec: TSQLSampleRecord;
  Ida: TIntegerDynArray;
begin

  Rec := TSQLSampleRecord.Create;

  Rec.Question := QuestionMemo.Text;
  Rec.Address_ := 'Dit is het adres laan, 23';
  Rec.postalCode := '1234AA';
  Rec.City := 'Nijmegen';

  Client.BatchStart(TSQLSampleRecord);
  Client.TransactionBegin(TSQLSampleRecord);

  StartTime := GetTickCount;
  try
    for i := 0 to 10000 do
    begin
      Rec.Name := NameEdit.Text + IntToString(i);
      Client.BatchAdd(Rec,false,true);
      if (i mod 1000) = 0 then
        Client.BatchSend(Ida);
    end;
  finally
    Rec.Free;
    Client.BatchSend(Ida);
    Client.Commit;
  end;

  EndTime := GetTickCount;
  OutputDebugString(PChar(Format('Done. %f Objects/sec ', [ 10.000/ ((endtime - starttime)/1000) ])));
  Beep;
end;

The results:

using Regular Client.Add: 625 objects per sec
Using BatchAdd: 711 objects/sec
Using BatchAdd + transactions: 653 objects/sec

Last edited by Bascy (2012-07-09 11:38:34)

ab · 2012-07-09 15:42:07

If it is slower with transactions, I guess this is because it may be not properly handled with external tables, or perhaps because SQLite3 is buferring the content within virtual tables.
Or perhaps because you use BatchSend without a new BatchOpen.
In the meanwhile, it is IMHO better not to use transactions.

Which transmission protocol are you using?
I guess the bottleneck here is the DB layer itself, if the difference between Add and BatchAdd is only 13%.
If you use HTTP, did you try disable the SynLZ compression?

What is the performance if you use a manual SQL INSERT using SynDBOracle? Or regular TQuery?
What is the latency of the DB? (i.e. ping of the DB server computer from client?)

What if you enable logging, and check the generated logs?

Direct SQLite3 engine run within a transaction is IMHO much faster than 600 objects/s.
(SQLite3 without a transaction is very slow, but within a transaction, it has very high output)

Bascy · 2012-07-11 09:14:09

The latency as stated by TNSPING is 10-20 msec

Executed some more tests, directly executing 4 possible scenarios in one go, using the following code:

procedure TForm1.FormCreate(Sender: TObject);
begin
  Props := TSQLDBOracleConnectionProperties.Create('OPFTEST','OPF_MORMOT','secret', 871);
  Model := CreateSampleModel; // from SampleData unit
  VirtualTableExternalRegister(Model,TSQLSampleRecord,Props,'SampleRecord');
  Client := TSQLRestClientDB.Create(Model, nil, 'test.db3', TSQLRestServerDB);
  Client.Server.CreateMissingTables;
end;

function TForm1.Addmany(aUseBatch, aUseTransactions: boolean): double;
var
  i: Integer;
  Starttime, Endtime: cardinal;
  Rec: TSQLSampleRecord;
  Ida: TIntegerDynArray;

begin

  Rec := TSQLSampleRecord.Create;

  Rec.Question := QuestionMemo.Text;
  Rec.Address_ := 'Dit is het adres laan, 23';
  Rec.postalCode := '1234AA';
  Rec.City := 'Nijmegen';

  if aUseBatch then
    Client.BatchStart(TSQLSampleRecord);
  if aUseTransactions then
    Client.TransactionBegin(TSQLSampleRecord);

  StartTime := GetTickCount;
  try
    for i := 0 to cRecordCount do
    begin
      Rec.Name := NameEdit.Text + IntToString(i);
      if aUseBatch then
        begin
          Client.BatchAdd(Rec,false,true);
          if (i mod 1000) = 0 then
          begin
            Client.BatchSend(Ida);
            Client.BatchStart(TSQLSampleRecord);
          end;
        end
      else
        Client.Add(Rec,true);

    end;
  finally
    Rec.Free;
    if aUseBatch then
      Client.BatchSend(Ida);
    if aUseTransactions then
      Client.Commit;
  end;

  EndTime := GetTickCount;
  Result := cRecordCount /((endtime - starttime)/1000);
end;

No Batch, No Transactions: 600 obj/sec
Batch, No Transactions: 706 obj/sec
Batch, Transactions: 681 obj/sec
No Batch, Transactions: 624 obj/sec

As to your other questions and suggestions: I will be very happy to provide you with the answers and results, but as we are only investigating the possibilities of mORMot (we haven't reached a conclusion yet), I cannot invest to much time in finding out how these can be realized ;-). If you provide me with the code or hints on how to do those things, I'll execute them and post the result in here.

Last edited by Bascy (2012-07-11 09:23:58)

ab · 2012-07-11 10:12:06

Thanks for the feedback.

TNSPING has a 10 ms resolution, so it is not precise enough when you want to know the latency of multiple requests.
A good old PING does make sense in such case.
I suspect the BATCH mode at SynDBOracle level is the only way of increasing speed here.

Bascy · 2012-07-11 10:39:09

ab wrote:

TNSPING has a 10 ms resolution, so it is not precise enough when you want to know the latency of multiple requests.
A good old PING does make sense in such case.

Good old Ping says: 0ms average

ab · 2012-07-11 11:00:57

Thanks.

What are your expectations about speed, or more generally about your project architecture?

Bascy · 2012-07-11 11:27:36

We are not sure yet. At the moment we are testing several ORMs learning what can be realistic performances.
At the moment we have a single client windows application, who's design principles date more than 15 years back. It uses an embedded NexusDB engine which is (of course) lightning fast. There are a lot of very data-intensive calculations needed in all kinds of analyses. The dataset can be from just a few Mb to several dozens of GB. Becuase of its design all the analyses were done on a record-by-record basis in the client (Dataset.First - .Next - Until .EOF)

For the newly designed application client-server with multiple clients (windows, HTML, Scripting, ...), SOAP or REST, multiple databases (Oracle and MSSQL required, others are optional) are all MustHave requirements. This all demands a very different approach to the new design, and our way of thinking (guess we need a bit of re-education ;-)).

The application also has to have the possibility to convert external data into its own model, and this could also be a few MB or several GB that has to be converted on a regular basis...

Another important function for us developer ( not so much from a users point of view) is the possibility to update the Model and its underlying database when the model changes with new version of the application (new or dropped classes, new or changed published properties, new indeces etc). This is something we have not found in mORMot yet (while it is present i.e. in InstantObjects. Any plans on such a thing?

ab · 2012-07-11 12:45:06

Thanks for the input.

In its current status, mORMot is able to add new tables or new columns if needed, when the TSQLRecord definition changed.
But it can't rename a column, or change on the fly data content (e.g. convert from integer to text, or change the column width): for such case, it should be handled manually when opening the table, by overriding TSQLRestServerDB.CreateMissingTables method - this is the purpose of the user_version parameter. The same parameter is easy to use for adding an index.

For your design, a SOA approach does IMHO make sense: the ORM could rely on the server side only, for data-demanding process.
Client-Server ORM could also make sense for basic CRUD process.
But when you have to deal with some high-level process, demanding some data, SOA sounds a better scaling approach.
I suspect the high-level part of the mORMot documentation (in the SAD doc) is worth taking a look at.

Bascy · 2012-07-11 14:16:46

Thanks for the info

Your answer does confuse me a bit. What do you mean by "Client-server ORM" as opposed to SOA mentioned later on?

I have read the SAD Doc, until page 186 and it does appeal to me ;-)
But now i have to convince my boss, when he gets back from holiday

ab · 2012-07-11 15:28:26

In short:
- ORM is for persistency with database agnistocism;
- Client-Server ORM is for persistency, but from a remote access (no need to install the DB client libraries on PCs);
- SOA is to expose high-level business services, then access from a client with no DB connection at all (can be a Delphi app or even an AJAX client).

SOA fits very well scalable distributed applications.
You write all your business logic on the server, then you access remotely to this intelligence from thin clients.

Our little mORMot is both Client-Server ORM (it was its initial goal), and is now a full SOA framework (like WCF) - even more advanced than DataSnap and others for Delphi.
And it includes some nice features, worth it on Client-Server SOA, like reporting, logging, security.

In a first step, with mORMot you can keep existing code on the server side on Delphi, then expose it via SOA methods.
Then migrate into full ORM/SOA approach on server side, one business unit at a time.
With the straight of Delphi, i.e. speed, deployment easy, scalability and maintainability.

ab · 2012-07-11 16:30:14

Btw make sure your TSQLRecord uses index 50 for RawUtf8 properties for instance.

If a property is defined just as RawUTF8 (or string), it will be stored as a CLOB, so database access will be much slower.

You shall add an "index n" attribute, with n being the number of chars of the column in the DB.

Bascy · 2012-07-12 07:16:42

Again, thanks for the advise!

One of the things I don't like so much about your approach to the ORM is the mis-use of existing Delphi constructs like "index n" attribute for the maximum length of a string-property. Other ORMs solve this i.e. with official Class-attributes like TMS Aurelius:

type
  [Entity]
  [Table('ARTISTS')]
  [Sequence('SEQ_ARTISTS')]
  [Id('FId', TIdGenerator.IdentityOrSequence)]
  TArtist = class
  private
    [Column('ID', [TColumnProp.Unique, TColumnProp.Required, TColumnProp.NoUpdate])]
    FId: Integer;
    FArtistName: string;
    FGenre: Nullable<string>;
    function GetArtistName: string;
    procedure SetArtistName(const Value: string);
  public
    property Id: integer read FId;

    [Column('ARTIST_NAME', [TColumnProp.Required], 100)]
    property ArtistName: string read GetArtistName write SetArtistName;

    [Column('GENRE', [], 100)]
    property Genre: Nullable<string> read FGenre write FGenre;
  end;

IMHO this approach has a few advantages:

Every class can be persisted this way, without any specific demands on class-hierarchy
Cleaner code, much better readable because there are no standard language construct that mean something else in this ORM context

What are your thoughts on this?

Last edited by Bascy (2012-07-12 07:21:21)

mpv · 2012-07-12 08:36:43

About Oracle:
1) look on your Oracle performance. Try this example on your Oracle: http://www.sqlines.com/sql-server/inser … erformance. Example running time is the best result you can gain (in client-server life you code will be 2-5 times slower because of remote execution)
2) in your example you must commit transaction every 1000 execution for best performance (call Client.Commit; and when start new transaction) - Oracle don't like long transaction
3) to load big amount of data to Oracle the only way is to use Oracle bulk operation (supported by ODAC as I know) - 10-100 times faster compared to step by step insert. Not supported by mORMot yet.

About attributes - this is a good idea, but there is 2 problems:
1) mORMot work with Delphi versions (Delphi 5 and up) which dose not support attributes
2) new RTTI in Delphi (which support attribute) is VERY-VERY slow as for me.

Last edited by mpv (2012-07-12 08:53:46)

ab · 2012-07-12 10:22:35

@Bascy I understand very well your concern.

See this blog article for my thoughts on this very interesting matter.
http://blog.synopse.info/post/2012/07/1 … e-them-all

@mpv Thanks for the very good material provided, like always.
Sounds like if the guy is not an Oracle expert, though. Other sources should have not implement thinks like this.

I have plans to add BATCH mode for SynDB layers (i.e. BULK insert for MS SQL, and Array Binding for Oracle).
It will get even much better speed than individual inserts.

For most common DB operations, there is no BATCH mode needed: atomic commit is to be used.
Only for some dedicated process (like consolidation or initialization), it does make sense.

Bascy · 2012-07-12 12:49:51

Thanks for the blog-entry! It made a lot of things clearer to me.

One correction: the link at the bottom of the article to this forum thread is not correct

Last edited by Bascy (2012-07-12 12:54:38)

ab · 2012-07-12 14:23:33

Blog article has been corrected.
Thanks for the feedback.

I'm happy it helped.

TPrami · 2012-07-13 06:42:41

Just to give some feedback... (And some more than less of an Off topic stuff )

- "You have to inherit from TSQLRecord, and can't persist any plain class";

This is just a problem (IMHO) if you are trying to fit mORMot to old program architecture. New one you can live with this, I think and most likely this decision makes many things better (at least I think it would be easier to at framework level this way).

So why to complain? I see not big reason to do so...

That is why other kind of ORMs provide a way of mapping classes to tables using external files (some ORMs provide both ways of definition).

Files are Evil. Nice to give some non programmer (DB admin or program architect) job to do mapping with an XML-file etc, but this makes debugging really pain in the *rse... If mappin is done in code it is very easy to debug.

....

And last something my own ideas/rant...

I've tried to use little bit of the RemObjects Data Abstract, and I think it is good product for what is designed for, but falls short in many aspects.

Even if you are DB-expert (which I have been lucky to work with) are in more than less trouble with pretty simple SQL (By simple by I mean that what it does, not what it looks like). With AQL you can do what you like or know how to do. But very soon you have so complex SQL in your hands that it makes your life

My dream would be that I define the SQL sent to db-server with debuggable code. Best would be the way to chop it in pieces some how. Doing some queries with Ro-da the code was starting to be more than less as complex as the SQL, which is not good.

There should be tool that would simplify the process significantly.

Like defining the Query in a way that it is in logical parts, not just massive Fluent interface mess, which is very close to produced SQL. All the complexities of SQL should be hidden (I have no very good idea how it could be done), because if you can say something like.

1. Fetch these columns in these tables
2. the relation of these tables are ones I give you here
3. limit the result like this.

If you write an SQL for this, maybe join of an two tables, it already starts to be unreadable.

I was pondering while back, some kind of query builder, which would have two modes. the developement mode which would do most of the process in the client side. and programmer could analyze the resulted data more than less visually, but also by writing some kin d of unit tests or helper methods. (if data is known it would be easy to define cod that loops the dataset trough and does some tests on it).

And after you've debugged the client side part of it and it works, then it could be set to build the final SQL which does the complex SQL and let the DB.-server then return the just the data which is needed.

Because in some cases I've been thinking that time it takes to build these SQL's, would be cheaper (with not so good performance) to just load the bulk data (if not way too big) and do the filtering in Pascal-code at server and/or client level. Mainly because it is very very difficult to find the reason why this and that behaves oddly in the SQL, but very easy to look trough the any kind of Dataset in Pascal code and define few if clauses and put needed rows in the other set of data. Might not perform super well, but at least you would feel that you know what is going on

Actually we did something like this, for static data, we made Hash tables of objects (of DB rows) with certain parameters and it was really easy to figure out why something was not working as it should.

In few words what I am talking about is that easier it is to define the what data is going to be fetched from the server the better. I think even by making some Cached tables on the serverside which you make using pure pascal code is better than have monsters f queries that takes lot of time to change/develop and/or fix...

-Tee-

ab · 2012-07-13 10:09:44

About the difficulty to debug SQL complex statements, I found out several tools to make thinks easier (at least in the Oracle world I work in):
- If you have your database already mapped with Business Objects, you can create complex queries visually, then get the SQL generated - in respect to SQL generation, this expensive tool is amazing;
- If you are working with Oracle, you should test the Oracle SQL developper tool - http://www.oracle.com/technetwork/devel … 46113.html - which allows you to trace the query and profile it;
- Even SQlite3 has an "explain" mode, which can help if you take the time to understand its VM - see http://www.sqlite.org/eqp.html;
- A lot of visual tools are available, but no one is perfect, I think.

What you describe as a solution is very similar to the Map/Reduce pattern.
See http://en.wikipedia.org/wiki/Mapreduce

It is sometimes implemented with a stored procedure in the RDBMS world.
But its main purpose is to be used in NoSQL world, and with huge amount of data (e.g. Google farms are heavily using such algorithms for processing all their data).

With mORmot, you would be able to create some "cached tables" (as you name them) from dedicated requests, then use SQLite3 virtual tables mechanism to access to this consolidated data, via easy to use SQL.
mORMot is able to join both worlds: MapReduce and SQL.
You could even create a virtual table which is the result of a JOINed query: that is, no temporary table, but the result of a sub query.

ab · 2012-07-18 15:58:21

Hello,

I've just added BATCH support to SynDBOracle unit.
See http://synopse.info/fossil/info/d472f2ea3d
That is, it is able to use the high-performance array bind DML mode, available with OCI.

Here are some benchmark results:

{
"Engine": "Oracle",
"CreateTableTime": "782.98ms",
"NumberOfElements": 5000,
"InsertTime": "11.51s",
"InsertRate": 434,
"InsertBatchTime": "107.78ms",
"InsertBatchRate": 46386,
"InsertTransactionTime": "7.50s",
"InsertTransactionRate": 666,
"InsertBatchTransactionTime": "98.71ms",
"InsertBatchTransactionRate": 50651,
"ReadOneByOneTime": "6.43s",
"ReadOneByOneRate": 777,
"ReadAllTime": "87.69ms",
"ReadAllRate": 57015,
"ClientCloseTime": "45.58ms"
}

In short:
- 434 inserts / second without BATCH mode;
- 50651 inserts / seconds with BATCH mode.
Reading is also much faster if all items are read at once (57015 / second), instead of one by one (777 / second).

For comparison, here are the SQLite3 results:

{
"Engine": "SQLite3",
"CreateTableTime": "130.59ms",
"NumberOfElements": 5000,
"InsertTime": "3.07s",
"InsertRate": 9,
"InsertBatchTime": "3.39s",
"InsertBatchRate": 8,
"InsertTransactionTime": "341.70ms",
"InsertTransactionRate": 14632,
"InsertBatchTransactionTime": "335.21ms",
"InsertBatchTransactionRate": 14915,
"ReadOneByOneTime": "110.03ms",
"ReadOneByOneRate": 45440,
"ReadAllTime": "51.21ms",
"ReadAllRate": 97621,
"ClientCloseTime": "62us"
}

It is known that due to its ACID behavior, SQlite3 is very slow with insert without any transaction.
But within a transaction, it has very good speed, comparable to Oracle, and even faster for reading.
Not bad for a free library!

I've added a new sample program in http://synopse.info/fossil/dir?name=SQL … erformance

I suppose it will be difficult to find a faster ORM around.

I'll soon update the benchmark sample, in order to test also communication layers (results above are for in-process running, but we'll need named pipe, gdi messages and HTTP communication).

ab · 2012-07-19 13:45:44

I've added a lot of fixes and speed enhancements about external virtual tables in mORMot.

I've included SQlite3 in memory to the benchmark, and also OleDB connection (using a JET engine).

Here are some benchmarks.

{
"Engine": "Oracle",
"CreateTableTime": "1.25s",
"NumberOfElements": 5000,
"InsertTime": "11.22s",
"InsertRate": 445,
"InsertBatchTime": "96.09ms",
"InsertBatchRate": 52034,
"InsertTransactionTime": "7.83s",
"InsertTransactionRate": 638,
"InsertBatchTransactionTime": "100.71ms",
"InsertBatchTransactionRate": 49645,
"ReadOneByOneTime": "6.59s",
"ReadOneByOneRate": 757,
"ReadAllVirtualTime": "97.97ms",
"ReadAllVirtualRate": 51033,
"ReadAllDirectTime": "62.70ms",
"ReadAllDirectRate": 79735,
"ClientCloseTime": "44.29ms"
}
{
"Engine": "SQLite3",
"CreateTableTime": "104.49ms",
"NumberOfElements": 5000,
"InsertTime": "2.90s",
"InsertRate": 10,
"InsertBatchTime": "3.34s",
"InsertBatchRate": 8,
"InsertTransactionTime": "392.43ms",
"InsertTransactionRate": 12741,
"InsertBatchTransactionTime": "331.82ms",
"InsertBatchTransactionRate": 15068,
"ReadOneByOneTime": "106.39ms",
"ReadOneByOneRate": 46994,
"ReadAllVirtualTime": "55.09ms",
"ReadAllVirtualRate": 90755,
"ReadAllDirectTime": "32.05ms",
"ReadAllDirectRate": 155996,
"ClientCloseTime": "89us"
}
{
"Engine": "SQLite3 (memory)",
"CreateTableTime": "711us",
"NumberOfElements": 5000,
"InsertTime": "1.18ms",
"InsertRate": 25252,
"InsertBatchTime": "1.00ms",
"InsertBatchRate": 29821,
"InsertTransactionTime": "158.17ms",
"InsertTransactionRate": 31609,
"InsertBatchTransactionTime": "128.84ms",
"InsertBatchTransactionRate": 38806,
"ReadOneByOneTime": "106.30ms",
"ReadOneByOneRate": 47033,
"ReadAllVirtualTime": "56.53ms",
"ReadAllVirtualRate": 88439,
"ReadAllDirectTime": "31.39ms",
"ReadAllDirectRate": 159271,
"ClientCloseTime": "67us"
}
{
"Engine": "Jet",
"CreateTableTime": "68.09ms",
"NumberOfElements": 500,
"InsertTime": "6.91s",
"InsertRate": 72,
"InsertBatchTime": "6.47s",
"InsertBatchRate": 77,
"InsertTransactionTime": "6.48s",
"InsertTransactionRate": 77,
"InsertBatchTransactionTime": "6.47s",
"InsertBatchTransactionRate": 77,
"ReadOneByOneTime": "5.40s",
"ReadOneByOneRate": 92,
"ReadAllVirtualTime": "24.20ms",
"ReadAllVirtualRate": 20659,
"ReadAllDirectTime": "13.52ms",
"ReadAllDirectRate": 36971,
"ClientCloseTime": "68us"
}

You can note that Jet is slow, when compared to SQlite3.
And it consumes much more memory and disk than SQLite3.

It is also not surprising that SQlite3 in-memory tables (identified as "SQLite3 (memory)" above) is pretty fast.

We have added also a direct request (by-passing the virtual table module) for a SELECT * statement: it is up to 30-50% faster (see ReadAllDirectRate / ReadAllVirtualRate values).
If you want to use a map/reduce algorithm, it may speed up a lot your process.

Sounds like we have gone far away into optimizations for external tables, here.
Feedback is welcome!

mpv · 2012-07-20 09:26:03

BATCH for Oracle is VERY GOOD FEATURE. AB - you are cool!

This is result for MS SQL(remote server on 100 Mb network): first for SQLOLEDB provider, second for SQLNCLI10 provider)
SQLOLEDB
{
"Engine": "MSSQL",
"CreateTableTime": "490.09ms",
"NumberOfElements": 5000,
"InsertTime": "14.84s",
"InsertRate": 336,
"InsertBatchTime": "17.75s",
"InsertBatchRate": 281,
"InsertTransactionTime": "12.81s",
"InsertTransactionRate": 390,
"InsertBatchTransactionTime": "16.26s",
"InsertBatchTransactionRate": 307,
"ReadOneByOneTime": "7.87s",
"ReadOneByOneRate": 635,
"ReadAllVirtualTime": "51.40ms",
"ReadAllVirtualRate": 97264,
"ReadAllDirectTime": "44.45ms",
"ReadAllDirectRate": 112483,
"ClientCloseTime": "7.75ms"
}
SQLCLI10
{
"Engine": "MSSQL",
"CreateTableTime": "649.59ms",
"NumberOfElements": 5000,
"InsertTime": "14.69s",
"InsertRate": 340,
"InsertBatchTime": "14.82s",
"InsertBatchRate": 337,
"InsertTransactionTime": "15.67s",
"InsertTransactionRate": 318,
"InsertBatchTransactionTime": "14.43s",
"InsertBatchTransactionRate": 346,
"ReadOneByOneTime": "7.79s",
"ReadOneByOneRate": 641,
"ReadAllVirtualTime": "52.35ms",
"ReadAllVirtualRate": 95510,
"ReadAllDirectTime": "54.36ms",
"ReadAllDirectRate": 91972,
"ClientCloseTime": "6.51ms"
}

May be use of ODBC give better result (but few years ago OleDB was faster - don't know for now).

ab · 2012-07-20 12:43:54

@mpv
Yes, in order to achieve comparable insertion speed, we would have to include BULK INSERT for OleDB layer also.
If you need it, you may share some code here!

Read time is impressive, in all cases.
I'm not sure we may achieve better responsiveness with ODBC. SynDB still need such an ODBC unit to be finished - there is http://synopse.info/fossil/finfo?name=SynDBODBC.pas unit pending.

mpv · 2012-07-20 15:55:46

@ab
For now I don't really need bulk insert - there is some architecture problems with bulk in my system..... (I need per record logic).
About ODBC - it become really need to mORMot when we migrate to x64 - there is no x64 OleDB drivers for latest MS SQL. But for now OleDB is enough.

But BULK operation in mORMot is cool! When I use oracle loader (it's a tools for import data using bulk insert) - the best result I reached is about 100000 rec/sec but in production server (UNIX, 12 Gb, 8 processor, very fast storage, Oracle optimization), so 50000 in simple computer is VERY GOOD result! Congratulations!

mpv · 2012-07-23 08:35:50

About blog entry http://blog.synopse.info/post/2012/07/1 … erformance
Using BULK operation in oracle (the same is true for MS SQL I think) give huge speed up not only due to the network latency, but in most case because of internal Oracle optimization for such query - in BULK mode all check (unique index, constraint and so on) disabled for insert time and Oracle do it after inserting all record. Index rebuild and transaction logging made in the same way (after all inserts). All needed extents allocated at once. And so on. It's ease to reproduce that network latency is not a main problem - stored procedure in PL SQL give about 100x speed up too then using BULK.

ab · 2012-07-23 09:26:38

You are right: I guess that index rebuilt only at the end of the process will speed up the task a lot.
It does perfectly sense.

So, as you state, remote BULK insert could be even faster than a stored procedure, in most common case.
Thanks for pointing this out!

ab · 2012-07-24 07:04:20

mpv wrote:

About ODBC - it become really need to mORMot when we migrate to x64 - there is no x64 OleDB drivers for latest MS SQL. But for now OleDB is enough.

When did you find this information?
AFAIK the SQL Native Client library is available in both 32 and 64 bits, and provide OleDB providers for MS SQL in both.

I know that OleDB is officially deprecated - see http://blog.synopse.info/post/2012/02/2 … oy-ODBC%21 - but there are still x64 OleDB drivers for MS SQL.

mpv · 2012-07-25 11:17:29

I had in mind for future versions of MS SQL (because it officially deprecated). Translation problems....

mpv · 2012-07-25 11:30:26

AB, I test my system with large query (20000 - 500000 rows) and find some way to increase performance:
in TSQLTableJSON.ParseAndConvert if we parse "not expanded" format we call GetRowCountNotExpanded function to calculate rowCount - it call take about 5% of all time. My propose is to add property rowCount for "not expanded" format {"FieldCount":9,"rowCount":50,"Values": [..]}
We always know how many row we fetch from datasource, so it's simple. In my test I have 5% performance increase when remove GetRowCountNotExpanded call. (for 400 000 rows dataset from 4.09 to 3.88 sec on round trip for MS SQL).
Is it possible?

Last edited by mpv (2012-07-25 11:37:44)

ab · 2012-07-25 11:47:42

It is a very good idea, and may also be useful in other areas of the framework.
I'll implement it for sure, since the "not expanded" format is proprietary.

See http://blog.synopse.info/post/2012/07/2 … -benchmark about general performance benchmark about the available data engines in mORMot.

mpv · 2012-07-25 13:57:00

And one remark: if we deal with JavaScript client (I mean browsers) it's good to start property name in JSON with a small letter. For example {"fieldCount":9,"rowCount":50,"values": [..]}. Name conversation in JS world: property start from small letter, constructor start from Upper case someObject.Create() but someObject.height. When I evaluate JSON in JS I got object and naming conversion is not mandatory, but is desirable. (There is no different in JS between constructor property or method - all is property, but method property if function type. So only one way to good reading code is naming conversation. JS is case sensitive).

ab · 2012-07-25 15:47:50

Good point.

Perhaps a code review of the framework is needed here, to be consistent with the JavaScript conventions: not only for the expanded format, but whenever it is generated.
Should it be done also for property names (e.g. "ID"=1 should be "id"=1 or "Name"="Arnaud" shoud be "name="Arnaud")? Object Pascal and JavaScript conventions differ here.

mpv · 2012-07-26 07:39:21

In my project I write "name"="Arnaud", "fullName" = "AB" but "ID"=1, because ID is abbreviation.
If you plane review framework for better JS support I propose to change "not expanded" format as listed below, or add new format type (I made it in such way in my descendant of TSQLTableJSON):
<pre>
{"fields":[{"name":"ID","type":"int"},{"name":"Code","type":"string", "size": 32},{"name":"Caption","type":"string", "size": 128},{"name":"mi_owner","type":"float"},{"name":"mi_createdate","type":"date","dateFormat":"c"},{"name":"mi_createuser","type":"float"},{"name":"mi_modifydate","type":"date","dateFormat":"c"},{"name":"mi_modifyuser","type":"float"}],
"rowCount": 3,
"data":[[1,"code1","Caption code1",1,"2012-05-23T17:47:01",1,"2012-05-23T17:47:13",1],
[2,"code2","Caption code2",1,"2012-05-23T17:47:01",1,"2012-05-23T17:47:13",1],
[3,"code3","Caption code3",1,"2012-05-23T17:47:01",1,"2012-05-23T17:47:13",1]]
}
</pre>
I add fields array property with field definition (as I got it from provider), rowCount property for quick parse and frame each row in array. With such format use jQuery or ExtJS or other JS framework became pretty easy.
Field definition is optional.

ab · 2012-07-26 08:47:50

Sometimes, the not expanded format is used just for one record (e.g. in TSQLRecord.FillFrom) so having the whole field definitions does not make sense here.

This is my concern about this format.

But the data[] nested array format is indeed nice.

mpv · 2012-07-26 09:37:25

Yes, I understand about one record. And I make field definition optional. In some case we need it, in other - not. Let's default will be FALSE - no field definition pass.

ab · 2012-07-26 09:47:02

I've modified the fieldCount/values names, and associated regression tests.
See http://synopse.info/fossil/info/f641049835

But I'm afraid it won't be feasible to add the "rowCount": property to the JSON non expanded format.
In fact, in mot TJSONWriter use, the number of rows is not yet available: header is created previously of retrieving the rows.
There is no generic way of adding this property.
I could add a parameter to TJSONWriter.AddColumn, then use it in the cases where the row count is known (only TSQLTable.GetJSONValues from my understanding).
Therefore, I guess adding "rowCount" is possible for your own case, but not in the generic case of mORMot design.

mpv · 2012-07-26 11:30:26

If we write "rowCount": xx in TSQLTable.GetJSONValues and in TSQLTableJSON.ParseAndConvert analyze it exists and if exist - use rowCount value instead of calculating row count this be enough for me.

ab · 2012-07-27 15:06:12

I've added optional "rowCount": in TSQLRestServerStaticInMemory.GetJSONValues, TSQLTable.GetJSONValues and in TSQLTableJSON.ParseAndConvert for about 5% faster process of huge content.
See http://synopse.info/fossil/info/1ecf509a4b

Hope it works as expected for you.

Since I've been able to use it for TSQLRestServerStaticInMemory.GetJSONValues, it would increase a little bit the TSQLRestServerStaticInMemory / TObjectList "Read all" benchmark, I suppose.

Now I think, we'll need to implement the nested array format.

Thanks for the proposal!

mpv · 2012-07-28 13:42:12

AB, I write <a href = "http://synopse.info/fossil/tktview?name=0b142f4f28">ticket</a> about new rowCount feature. Please, look on it. I think if we close this ticket "rowCount" become useful in generic case. But I do not know how to do this in current realization, only way I see it to write columns after rows and in reader parse result from end of JSON. May be you have better idea......

Last edited by mpv (2012-07-28 13:42:57)

ab · 2012-07-28 17:56:41

As I already stated in the forum, in a TSQLDBStatement, we do not know how many rows there will be when this method starts.

For FetchAllToJSON(), data is retrieved on the fly, so we only know how many rows there will be at the end.

For FetchAllAsJSON(), we may be able to add the rowcount at the beginning of the JSON content before returning the JSON string.
Since this method is used in SQLite3DB, it does make sense to add it for a result bigger than 64 KB for instance (not worth it for one or several small objects, IMHO).
We may try this:

function TSQLDBStatement.FetchAllAsJSON(Expanded: boolean; ReturnedRowCount: PPtrInt=nil): RawUTF8;
var Stream: TRawByteStringStream;
    RowCount: PtrInt;
    i: Integer;
begin
  Stream := TRawByteStringStream.Create;
  try
    RowCount := FetchAllToJSON(Stream,Expanded);
    if ReturnedRowCount<>nil then
      ReturnedRowCount^ := RowCount;
    result := Stream.DataString;
    if (not Expanded) and (length(result)>65536) then begin
      i := Pos(',"values',result);
      if i>0 then // FastMM4 will realloc everything in place -> should be fast
        Insert(',"rowCount":'+Int32ToUtf8(RowCount),Result,i);
    end;
  finally
    Stream.Free;
  end;
end;

But it is not worth the 5% (at best) speed up on the client side, since it will slow down the server-side process: such an "insert" into the result string could be quite slow, since all content will be copied in memory.
IMHO such an implementation will slow down the server instead of the client.
Not worth it.

Another possibility could be to always insert rowCount=00000 at the beginning, then replace it on request.
For small requests, we'll have to remove it.
I still don't think it is worth it.

mpv · 2012-07-30 07:27:48

About inserting - I agree with you - this is not good in server side.
Always insert rowCount=000000 (6 digits) is the way. I code this feature, make performance tests and post results here. After we get test result we can decide add this feature or not.
About remove rowCount for small requests IMHO 15 additional bytes it doesn't matter. But if we don't remove rowCount we will get identical behavior in all cases - it is more important.
For example in JS client part I simple write

 if (result.rowCount>0) {...}

If rowCount is optional I must write something like this

 if (result.rowCount?(result.rowCount>0):(result.data.length>0)) {...}

ab · 2012-07-30 07:30:52

You could just use result.data.length instead of result.rowCount in this case, I suspect.

Since the result is already parsed and unserialized, result.rowCount is just a duplicated information, here.

ab · 2013-03-19 10:45:05

I've added a new optional "rowCount":... field value in TSQLRestServerStaticInMemory.GetJSONValues, TSQLTable.GetJSONValues, TSQLTableJSON.ParseAndConvert, TSQLRequest.Execute and TSQLDBStatement.FetchAllToJSON methods at the end of the non expanded JSON content.

It will improve client JSON parsing performance, especially in case of huge content.
Enabling "rowCount":... on the Delphi client side let our regression tests about 4 KB of content request to speed up from 40,0000 requests per second to 60,000 requests per second!
Great speed enhancement, just as you suggested!
Thanks for the idea!

See http://synopse.info/fossil/info/760c6e358d
and http://synopse.info/fossil/info/85c06e2619

I've just ensured that "rowCount":... field will now be generated even for 0 and 1 values, to ensure perfect consistency when consumed from an AJAX request, as you stated above.
See http://synopse.info/fossil/info/900703f728

mORMot Open Source

#1 2012-07-06 13:19:21

Highest performance when adding a lot of records to Oracle

#2 2012-07-06 14:31:59

Re: Highest performance when adding a lot of records to Oracle

#3 2012-07-06 15:33:21

Re: Highest performance when adding a lot of records to Oracle

#4 2012-07-09 09:43:27

Re: Highest performance when adding a lot of records to Oracle

#5 2012-07-09 15:42:07

Re: Highest performance when adding a lot of records to Oracle

#6 2012-07-11 09:14:09

Re: Highest performance when adding a lot of records to Oracle

#7 2012-07-11 10:12:06

Re: Highest performance when adding a lot of records to Oracle

#8 2012-07-11 10:39:09

Re: Highest performance when adding a lot of records to Oracle

#9 2012-07-11 11:00:57

Re: Highest performance when adding a lot of records to Oracle

#10 2012-07-11 11:27:36

Re: Highest performance when adding a lot of records to Oracle

#11 2012-07-11 12:45:06

Re: Highest performance when adding a lot of records to Oracle

#12 2012-07-11 14:16:46

Re: Highest performance when adding a lot of records to Oracle

#13 2012-07-11 15:28:26

Re: Highest performance when adding a lot of records to Oracle

#14 2012-07-11 16:30:14

Re: Highest performance when adding a lot of records to Oracle

#15 2012-07-12 07:16:42

Re: Highest performance when adding a lot of records to Oracle

#16 2012-07-12 08:36:43

Re: Highest performance when adding a lot of records to Oracle

#17 2012-07-12 10:22:35

Re: Highest performance when adding a lot of records to Oracle

#18 2012-07-12 12:49:51

Re: Highest performance when adding a lot of records to Oracle

#19 2012-07-12 14:23:33

Re: Highest performance when adding a lot of records to Oracle

#20 2012-07-13 06:42:41

Re: Highest performance when adding a lot of records to Oracle

#21 2012-07-13 10:09:44

Re: Highest performance when adding a lot of records to Oracle

#22 2012-07-18 15:58:21

Re: Highest performance when adding a lot of records to Oracle

#23 2012-07-19 13:45:44

Re: Highest performance when adding a lot of records to Oracle

#24 2012-07-20 09:26:03

Re: Highest performance when adding a lot of records to Oracle

#25 2012-07-20 12:43:54

Re: Highest performance when adding a lot of records to Oracle

#26 2012-07-20 15:55:46

Re: Highest performance when adding a lot of records to Oracle

#27 2012-07-23 08:35:50

Re: Highest performance when adding a lot of records to Oracle

#28 2012-07-23 09:26:38

Re: Highest performance when adding a lot of records to Oracle

#29 2012-07-24 07:04:20

Re: Highest performance when adding a lot of records to Oracle

#30 2012-07-25 11:17:29

Re: Highest performance when adding a lot of records to Oracle

#31 2012-07-25 11:30:26

Re: Highest performance when adding a lot of records to Oracle

#32 2012-07-25 11:47:42

Re: Highest performance when adding a lot of records to Oracle

#33 2012-07-25 13:57:00

Re: Highest performance when adding a lot of records to Oracle

#34 2012-07-25 15:47:50

Re: Highest performance when adding a lot of records to Oracle

#35 2012-07-26 07:39:21

Re: Highest performance when adding a lot of records to Oracle

#36 2012-07-26 08:47:50

Re: Highest performance when adding a lot of records to Oracle

#37 2012-07-26 09:37:25

Re: Highest performance when adding a lot of records to Oracle

#38 2012-07-26 09:47:02

Re: Highest performance when adding a lot of records to Oracle

#39 2012-07-26 11:30:26

Re: Highest performance when adding a lot of records to Oracle

#40 2012-07-27 15:06:12