#1 2011-01-07 17:53:17

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,548
Website

TSynBigTableRecord class to store data, with fields handling

In our SynBigTable unit, the TSynBigTableRecord class is now available to store data, with fields handling.

That is, you've the incredible speed of our SynTable, with the possibility to have fields in the records.
That is, you're not limited to key/value pairs any more.

On an old Pentium 4m:
- creating 1,000,000 records with one 8 chars text field with one integer field : 1700 ms;
- reading all records, with per field data retrieval: 500 ms;
- searching for 100 diverse text/integer values, without index (i.e. with full database iteration): 4300 ms;
- searching for 100 diverse text/integer values, with an index: 0.1 ms;
- index creation (for text and integer fields altogether): 600 ms;
- adding a new field, and refreshing the whole data (since the layout on disk changed): 1040 ms;
- opening the file: 30 ms.

The file size is only 19 MB.

You read right. 1,000,000 records...

Here is some sample code:

Creating the database, with two fields, one textual named "text", and one integer named "int" (both using an index):

  TR := TSynBigTableRecord.Create(FN,'test');
  try
    TR.AddFieldUpdate;
    fText := TR.AddField('text',tftWinAnsi,[tfoIndex]);
    fInt := TR.AddField('Int',tftInt32,[tfoIndex]);

Adding 1,000,000 records (rec is a dedicated TSynTableData object, used for fast access to the field content):

    rec.Init(TR.Table);
    start(format('Add %.0n records',[n*1.0]));
    for i := 0 to n-1 do begin
      rec.SetFieldValue(fText,By8[i]); // same as rec.Field['text'] := By8[i];
      rec.SetFieldValue(fInt,i);
      TR.RecordAdd(rec);
    end;

Reading all records, accessing fields via a dedicated variant type - note that you access the field via there name, in true late-binding smile
Variant access sounds smarter than TSynTableData, but is also slower (because of the overhead of the current variant implementation).

  for i := 1 to n do begin
    vari := TR.VariantGet(i);
    if (vari.text<>By8[i-1]) or
       (vari.Int<>i-1) then
      exit;
  end;

Searching for 101 text items, using the index if available:

  for i := 0 to 100 do
    if not TR.Search(fText,By8[i],ID,IDCount) or
      (IDCount<>1) or (ID[0]<>i+1) then
        exit else
        IDCount := 0;

Stay tuned!

Offline

#2 2011-01-08 09:37:59

TPrami
Member
Registered: 2010-07-06
Posts: 116

Re: TSynBigTableRecord class to store data, with fields handling

Hello,

This seems very cool.

Would it (later) be possible to combine data storing and fields.

So I could store fields of data per stored data. Think of I want to store files and add some metadata fields for each file I've stored.

So I could have only one file and then to get fast some metadata of the file, like Last change time etc smile

I can go around it with using two or more files, but I was just pondering...

-Tee-

Offline

#3 2011-01-08 10:30:40

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,548
Website

Re: TSynBigTableRecord class to store data, with fields handling

Hello, at first, this sounds like a BLOB field to me.
It's not yet implemented, but it will be soon (there is already a tftBlob field type for TSynTable).
In fact, in the implementation scheme I want to follow, there will be two files: one for the "regular/simple" fields, one per BLOB field.
A hidden TSynBigTable instance will be created .

This will work if the metadata has a fixed number of fields.
Another possibility could be to a TSynTable with some in-memory data, located inside the same TSynBigTable instance. There will be only one file in this case.
I'll try to write such a TSynBigTableMetaData, because this file/metadata scheme is very useful in a lot of applications.
The TSynBigTableRecord implementation is a pure "record-oriented" storage engine. It does make sense if the number of records can be huge (that's why I tested it with 1 million records and more). And all data is accessed via memory-mapping, just right from the file layout.
But if your table is about to store a limited (some 1000s) of "files", then storing metadata in memory (in a dynamic array of RawByteString) makes sense.

Here is my road map for this weekend (I won't have finished this in two days, of course):
- implements tfoUnique kind of index;
- implements a "LIMIT" parameter for the searches (could make it faster if only the first is to be retrieved);
- implements the tftBlob field type as an "internal" field, that is if BLOB data is not of huge size, it can be stored inside the main record data, like a tftWinAnsi field;
- creates a TSynBigTableMetaData, for storing files plus some metadata - the field handling will use the already available TSynTable mechanism;
- implements a BEGINTRANSACTION/COMMIT/ROLLBACK mechanism in TSynBigTableRecord (perhaps in the main TSynBigTable class: just let the transaction records in memory, then transfert to the main fInMemory[] array only in case of COMMIT);
- implements the tftBlobExternal field type;
- interface the TSynBigTableRecord class into our SQLite3 engine - perhaps like a "TSQLServerStatic" class, or as a pure SQLite3 virtual table. Or perhaps both...

I guess that SQLite3 virtual tables could be the best use of our TSynBigTableRecord class new storage engine:
- Much faster than default SQLite3 engine for database access and records writing;
- Will use the proven SQLite3 SQL parser and interpreter: so we won't have to reinvent the wheel;
- The virtual table interface - see http://www.sqlite.org/vtab.html - is very well designed: in particular, it will make a very clever use of indexes, and SQL overloaded functions;
- It will allow to mix SQL statements with "normal" SQLite3 tables and our faster TSynBigTableRecord engine;
- And a good idea could be to "MIX" both TSQLServerStatic and Virtual table use in the engine.

A typical usage of a TSynBigTableRecord based virtual table could be a log table. Not many updates, huge number of records, not deleted often, need fast access.

Thanks for your interest and feedback!

Offline

#4 2011-01-08 17:42:36

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,548
Website

Re: TSynBigTableRecord class to store data, with fields handling

The tfoUnique kind of index is now working as expected.

A field update or adding is now forbidden if a value of a field with the tfoUnique set in its Options will duplicate.

Using this tfoUnique option only make it a bit slower, even if the index is updated on any record writing.

Offline

#5 2011-01-09 17:13:46

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,548
Website

Re: TSynBigTableRecord class to store data, with fields handling

I've added the LIMIT parameter to the searches.
The Limit parameter is similar to the SQL LIMIT clause: if greater than 0, an upper bound on the number of rows returned is placed (e.g. set Limit=1 to only retrieve the first match).
If you don't use any index, but need to iterate through all data, it could save you some time.

I've also split the tftBlob kind of field into two types: tftBlobInternal and tftBlobExternal.
- tftBlobInternal is stored inside the record, and could be used to store some data if it's not too huge (up to 1 MB e.g.)
- tftBlobExternal could make sense if the data is bigger: in this case, the storage engine is responsible of using a separate TSynTable, or plain files, for instance.

I'd like to add two items to my to-do list:
1) a tftCalculated field: providing a method will call this method to return some data from the current record content;
2) implement operands (> >= < <=) for searches, and not only "equals" - this is necessary for speeding up all SQL statements, via the SQLite3 virtual table mechanism.

I've also made some proof of concept and draft version of using TSynBigTableRecord as a SQLite3 virtual table source.
It will be a very efficient use of it.
One new advantage I discovered is that it may be more easy and much more efficient to add some custom data types (like MGRS coordinates, time periods, an array of values).
That is you can create more column/field types than regular SQLite3 types (Integer/Text/Double/Blob) in a CREATE VIRTUAL TABLE SQL statement... smile

Offline

#6 2011-01-10 20:29:06

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,548
Website

Re: TSynBigTableRecord class to store data, with fields handling

That's it, I've finished tonight the new TSynBigTableMetaData class to store data, with metadata fields in parallel.

It seems to work very well.
You can store files in the main big table record (it will stay on disk), and handle metadata with fields in memory.

In practice, it's very fast:

Testing Synopse Big Table classes...

Create 1,000,000 records data in memory  61.3 ms

Create a TSynBigTableMetaData database  0.4 ms
Add 1,000,000 records  795.3 ms
Try to add 125000 records with not unique field  69.2 ms
Read as variant  1552.5 ms
Read as TSynTableData  375.4 ms
Read direct  245.4 ms
Search 50 Text iterating  1355.8 ms
Search 200 Text using index  531.1 ms
Search 50 Int iterating  753.5 ms
Search 200  Int using index  0.1 ms
UpdateToFile  77.4 ms
Close  20.8 ms
Open  50.0 ms
Read as variant  1535.2 ms
Read as TSynTableData  377.5 ms
Read direct  238.8 ms
Search 50 Text iterating  1366.7 ms
Search 200 Text using index  0.2 ms
Search 50 Int iterating  752.0 ms
Search 200  Int using index  0.1 ms
Add a field  0.0 ms
Recreate file with new field layout  89.3 ms
Read as variant  1523.7 ms
Read as TSynTableData  378.9 ms
Read direct  244.3 ms
Search 50 Text iterating  1346.2 ms
Search 200 Text using index  0.3 ms
Search 50 Int iterating  743.8 ms
Search 200  Int using index  0.1 ms
Values: 7.6 MB, file size: 25.7 MB  61.4 ms

Create a TSynBigTableRecord database  0.3 ms
Add 1,000,000 records  809.4 ms
Try to add 125000 records with not unique field  67.4 ms
Read as variant  1553.4 ms
Read as TSynTableData  385.0 ms
Read direct  254.5 ms
Search 50 Text iterating  1978.7 ms
Search 200 Text using index  533.4 ms
Search 50 Int iterating  1384.0 ms
Search 200  Int using index  0.1 ms
UpdateToFile  65.3 ms
Close  2.1 ms
Open  20.4 ms
Read as variant  1576.2 ms
Read as TSynTableData  409.7 ms
Read direct  271.7 ms
Search 50 Text iterating  2329.3 ms
Search 200 Text using index  0.3 ms
Search 50 Int iterating  1704.9 ms
Search 200  Int using index  0.1 ms
Add a field  0.0 ms
Recreate file with new field layout  578.2 ms
Read as variant  1586.1 ms
Read as TSynTableData  407.8 ms
Read direct  263.8 ms
Search 50 Text iterating  2345.0 ms
Search 200 Text using index  0.2 ms
Search 50 Int iterating  1699.5 ms
Search 200  Int using index  0.1 ms
Values: 13.3 MB, file size: 19.0 MB  6.2 ms

Store 1,000,000 records of 8 chars key / 8 chars values  349.9 ms
Read  113.7 ms
UpdateToFile  62.1 ms
Close  8.7 ms
Open  37.4 ms
Verify  178.2 ms
Iteration test in physical order  228.2 ms
Iteration test in ID order  227.3 ms
Iteration speed  24.5 ms
ID[] speed  4.4 ms
GetAllIDs physical order  27.1 ms
GetAllIDs ID order  22.8 ms
GetAllIDs faster order  1.3 ms
GetAllPhysicalIndexes  1.2 ms
Values: 7.6 MB, file size: 18.1 MB  18.5 ms

Creating a TSynBigTable with 3750 elements  98.4 ms
Verify  38.6 ms
Close and Open  13.6 ms
Verify Random  41.6 ms
Verify  37.7 ms
Adding 1000 elements  12.5 ms
Updating 1000 elements  9.5 ms
Deleting 44 elements  0.0 ms
Iteration test  28.6 ms
Packing  80.7 ms
Verify  50.3 ms
Updating 31 elements  0.7 ms
Verify  42.3 ms
AsStream  28.1 ms
Iteration test ID order  35.2 ms
Iteration test physical order  34.2 ms
Iteration speed  0.1 ms
ID[] speed  79.8 ms
GetAllIDs physical order  0.1 ms
GetAllIDs ID order  0.1 ms
GetAllPhysicalIndexes  0.0 ms
Close and Open  10.2 ms
Verify  54.4 ms
Deleting 55 elements  0.0 ms
Iteration test  36.9 ms
Close  9.5 ms
Open  0.2 ms
Verify  53.3 ms
Updating 91 elements  2.0 ms
Verify  44.9 ms
AsStream  30.1 ms
Iteration test ID order  37.1 ms
Iteration test physical order  35.8 ms
Iteration speed  0.1 ms
ID[] speed  79.6 ms
GetAllIDs physical order  0.1 ms
GetAllIDs ID order  0.1 ms
GetAllPhysicalIndexes  0.0 ms
Packing  74.7 ms
Iteration test physical order  46.2 ms
Iteration test ID order  36.4 ms
Iteration speed  0.1 ms
ID[] speed  0.0 ms
Values: 86.4 MB, file size: 86.5 MB  8.0 ms

Creating a GUID-indexed TSynBigTableString with 21 * 4MB elements  60.6 ms
Verify  26.9 ms
Close  55.3 ms
Open  0.2 ms
Verify  92.7 ms
Delete  0.0 ms
Close and Open  8.6 ms
Pack  979.2 ms
Add one & UpdateToFile  22.5 ms
Verify  91.9 ms
Clear  18.0 ms
Values: 88.0 MB, file size: 88.0 MB

Creating a string-indexed TSynBigTableString with 3750 elements  104.7 ms
Verify  76.6 ms
Close and Open  8.6 ms
Verify  76.9 ms
Verify Random  71.6 ms
Adding 1000 elements  13.1 ms
Verify  82.0 ms
Deleting 95 elements  0.4 ms
Iteration test  30.6 ms
Packing  77.5 ms
Verify  95.7 ms
Updating 62 elements  1.8 ms
Verify  44.4 ms
AsStream  29.0 ms
Iteration test ID order  35.4 ms
Iteration test physical order  35.2 ms
Iteration speed  0.1 ms
ID[] speed  0.0 ms
GetAllIDs physical order  0.1 ms
GetAllIDs ID order  0.2 ms
GetAllPhysicalIndexes  0.0 ms
Close and Open  10.7 ms
Deleting 14 elements  0.0 ms
Iteration test  44.7 ms
Close  9.8 ms
Open  0.4 ms
Verify  92.5 ms
Updating 70 elements  1.6 ms
Verify  43.6 ms
AsStream  30.5 ms
Iteration test ID order  37.4 ms
Iteration test physical order  38.0 ms
Iteration speed  0.2 ms
ID[] speed  0.0 ms
GetAllIDs physical order  0.1 ms
GetAllIDs ID order  0.2 ms
GetAllPhysicalIndexes  0.0 ms
Values: 90.1 MB, file size: 90.1 MB

Tests OK :)

You've read well. It's no trick. This benchmark was run on my laptop (under Windows Seven, with a standard Core i7).
The class add 1,000,000 records in 795 ms, with 8 byte main data, and two meta data fields: one 8 chars text and one integer field. Both fields have an index for immediate retrieval from any data!
Try to find a faster database engine!!!!

Offline

Board footer

Powered by FluxBB