You are not logged in.
In our SynBigTable unit, the TSynBigTableRecord class is now available to store data, with fields handling.
That is, you've the incredible speed of our SynTable, with the possibility to have fields in the records.
That is, you're not limited to key/value pairs any more.
On an old Pentium 4m:
- creating 1,000,000 records with one 8 chars text field with one integer field : 1700 ms;
- reading all records, with per field data retrieval: 500 ms;
- searching for 100 diverse text/integer values, without index (i.e. with full database iteration): 4300 ms;
- searching for 100 diverse text/integer values, with an index: 0.1 ms;
- index creation (for text and integer fields altogether): 600 ms;
- adding a new field, and refreshing the whole data (since the layout on disk changed): 1040 ms;
- opening the file: 30 ms.
The file size is only 19 MB.
You read right. 1,000,000 records...
Here is some sample code:
Creating the database, with two fields, one textual named "text", and one integer named "int" (both using an index):
TR := TSynBigTableRecord.Create(FN,'test');
try
TR.AddFieldUpdate;
fText := TR.AddField('text',tftWinAnsi,[tfoIndex]);
fInt := TR.AddField('Int',tftInt32,[tfoIndex]);
Adding 1,000,000 records (rec is a dedicated TSynTableData object, used for fast access to the field content):
rec.Init(TR.Table);
start(format('Add %.0n records',[n*1.0]));
for i := 0 to n-1 do begin
rec.SetFieldValue(fText,By8[i]); // same as rec.Field['text'] := By8[i];
rec.SetFieldValue(fInt,i);
TR.RecordAdd(rec);
end;
Reading all records, accessing fields via a dedicated variant type - note that you access the field via there name, in true late-binding
Variant access sounds smarter than TSynTableData, but is also slower (because of the overhead of the current variant implementation).
for i := 1 to n do begin
vari := TR.VariantGet(i);
if (vari.text<>By8[i-1]) or
(vari.Int<>i-1) then
exit;
end;
Searching for 101 text items, using the index if available:
for i := 0 to 100 do
if not TR.Search(fText,By8[i],ID,IDCount) or
(IDCount<>1) or (ID[0]<>i+1) then
exit else
IDCount := 0;
Stay tuned!
Offline
Hello,
This seems very cool.
Would it (later) be possible to combine data storing and fields.
So I could store fields of data per stored data. Think of I want to store files and add some metadata fields for each file I've stored.
So I could have only one file and then to get fast some metadata of the file, like Last change time etc
I can go around it with using two or more files, but I was just pondering...
-Tee-
Offline
Hello, at first, this sounds like a BLOB field to me.
It's not yet implemented, but it will be soon (there is already a tftBlob field type for TSynTable).
In fact, in the implementation scheme I want to follow, there will be two files: one for the "regular/simple" fields, one per BLOB field.
A hidden TSynBigTable instance will be created .
This will work if the metadata has a fixed number of fields.
Another possibility could be to a TSynTable with some in-memory data, located inside the same TSynBigTable instance. There will be only one file in this case.
I'll try to write such a TSynBigTableMetaData, because this file/metadata scheme is very useful in a lot of applications.
The TSynBigTableRecord implementation is a pure "record-oriented" storage engine. It does make sense if the number of records can be huge (that's why I tested it with 1 million records and more). And all data is accessed via memory-mapping, just right from the file layout.
But if your table is about to store a limited (some 1000s) of "files", then storing metadata in memory (in a dynamic array of RawByteString) makes sense.
Here is my road map for this weekend (I won't have finished this in two days, of course):
- implements tfoUnique kind of index;
- implements a "LIMIT" parameter for the searches (could make it faster if only the first is to be retrieved);
- implements the tftBlob field type as an "internal" field, that is if BLOB data is not of huge size, it can be stored inside the main record data, like a tftWinAnsi field;
- creates a TSynBigTableMetaData, for storing files plus some metadata - the field handling will use the already available TSynTable mechanism;
- implements a BEGINTRANSACTION/COMMIT/ROLLBACK mechanism in TSynBigTableRecord (perhaps in the main TSynBigTable class: just let the transaction records in memory, then transfert to the main fInMemory[] array only in case of COMMIT);
- implements the tftBlobExternal field type;
- interface the TSynBigTableRecord class into our SQLite3 engine - perhaps like a "TSQLServerStatic" class, or as a pure SQLite3 virtual table. Or perhaps both...
I guess that SQLite3 virtual tables could be the best use of our TSynBigTableRecord class new storage engine:
- Much faster than default SQLite3 engine for database access and records writing;
- Will use the proven SQLite3 SQL parser and interpreter: so we won't have to reinvent the wheel;
- The virtual table interface - see http://www.sqlite.org/vtab.html - is very well designed: in particular, it will make a very clever use of indexes, and SQL overloaded functions;
- It will allow to mix SQL statements with "normal" SQLite3 tables and our faster TSynBigTableRecord engine;
- And a good idea could be to "MIX" both TSQLServerStatic and Virtual table use in the engine.
A typical usage of a TSynBigTableRecord based virtual table could be a log table. Not many updates, huge number of records, not deleted often, need fast access.
Thanks for your interest and feedback!
Offline
The tfoUnique kind of index is now working as expected.
A field update or adding is now forbidden if a value of a field with the tfoUnique set in its Options will duplicate.
Using this tfoUnique option only make it a bit slower, even if the index is updated on any record writing.
Offline
I've added the LIMIT parameter to the searches.
The Limit parameter is similar to the SQL LIMIT clause: if greater than 0, an upper bound on the number of rows returned is placed (e.g. set Limit=1 to only retrieve the first match).
If you don't use any index, but need to iterate through all data, it could save you some time.
I've also split the tftBlob kind of field into two types: tftBlobInternal and tftBlobExternal.
- tftBlobInternal is stored inside the record, and could be used to store some data if it's not too huge (up to 1 MB e.g.)
- tftBlobExternal could make sense if the data is bigger: in this case, the storage engine is responsible of using a separate TSynTable, or plain files, for instance.
I'd like to add two items to my to-do list:
1) a tftCalculated field: providing a method will call this method to return some data from the current record content;
2) implement operands (> >= < <=) for searches, and not only "equals" - this is necessary for speeding up all SQL statements, via the SQLite3 virtual table mechanism.
I've also made some proof of concept and draft version of using TSynBigTableRecord as a SQLite3 virtual table source.
It will be a very efficient use of it.
One new advantage I discovered is that it may be more easy and much more efficient to add some custom data types (like MGRS coordinates, time periods, an array of values).
That is you can create more column/field types than regular SQLite3 types (Integer/Text/Double/Blob) in a CREATE VIRTUAL TABLE SQL statement...
Offline
That's it, I've finished tonight the new TSynBigTableMetaData class to store data, with metadata fields in parallel.
It seems to work very well.
You can store files in the main big table record (it will stay on disk), and handle metadata with fields in memory.
In practice, it's very fast:
Testing Synopse Big Table classes...
Create 1,000,000 records data in memory 61.3 ms
Create a TSynBigTableMetaData database 0.4 ms
Add 1,000,000 records 795.3 ms
Try to add 125000 records with not unique field 69.2 ms
Read as variant 1552.5 ms
Read as TSynTableData 375.4 ms
Read direct 245.4 ms
Search 50 Text iterating 1355.8 ms
Search 200 Text using index 531.1 ms
Search 50 Int iterating 753.5 ms
Search 200 Int using index 0.1 ms
UpdateToFile 77.4 ms
Close 20.8 ms
Open 50.0 ms
Read as variant 1535.2 ms
Read as TSynTableData 377.5 ms
Read direct 238.8 ms
Search 50 Text iterating 1366.7 ms
Search 200 Text using index 0.2 ms
Search 50 Int iterating 752.0 ms
Search 200 Int using index 0.1 ms
Add a field 0.0 ms
Recreate file with new field layout 89.3 ms
Read as variant 1523.7 ms
Read as TSynTableData 378.9 ms
Read direct 244.3 ms
Search 50 Text iterating 1346.2 ms
Search 200 Text using index 0.3 ms
Search 50 Int iterating 743.8 ms
Search 200 Int using index 0.1 ms
Values: 7.6 MB, file size: 25.7 MB 61.4 ms
Create a TSynBigTableRecord database 0.3 ms
Add 1,000,000 records 809.4 ms
Try to add 125000 records with not unique field 67.4 ms
Read as variant 1553.4 ms
Read as TSynTableData 385.0 ms
Read direct 254.5 ms
Search 50 Text iterating 1978.7 ms
Search 200 Text using index 533.4 ms
Search 50 Int iterating 1384.0 ms
Search 200 Int using index 0.1 ms
UpdateToFile 65.3 ms
Close 2.1 ms
Open 20.4 ms
Read as variant 1576.2 ms
Read as TSynTableData 409.7 ms
Read direct 271.7 ms
Search 50 Text iterating 2329.3 ms
Search 200 Text using index 0.3 ms
Search 50 Int iterating 1704.9 ms
Search 200 Int using index 0.1 ms
Add a field 0.0 ms
Recreate file with new field layout 578.2 ms
Read as variant 1586.1 ms
Read as TSynTableData 407.8 ms
Read direct 263.8 ms
Search 50 Text iterating 2345.0 ms
Search 200 Text using index 0.2 ms
Search 50 Int iterating 1699.5 ms
Search 200 Int using index 0.1 ms
Values: 13.3 MB, file size: 19.0 MB 6.2 ms
Store 1,000,000 records of 8 chars key / 8 chars values 349.9 ms
Read 113.7 ms
UpdateToFile 62.1 ms
Close 8.7 ms
Open 37.4 ms
Verify 178.2 ms
Iteration test in physical order 228.2 ms
Iteration test in ID order 227.3 ms
Iteration speed 24.5 ms
ID[] speed 4.4 ms
GetAllIDs physical order 27.1 ms
GetAllIDs ID order 22.8 ms
GetAllIDs faster order 1.3 ms
GetAllPhysicalIndexes 1.2 ms
Values: 7.6 MB, file size: 18.1 MB 18.5 ms
Creating a TSynBigTable with 3750 elements 98.4 ms
Verify 38.6 ms
Close and Open 13.6 ms
Verify Random 41.6 ms
Verify 37.7 ms
Adding 1000 elements 12.5 ms
Updating 1000 elements 9.5 ms
Deleting 44 elements 0.0 ms
Iteration test 28.6 ms
Packing 80.7 ms
Verify 50.3 ms
Updating 31 elements 0.7 ms
Verify 42.3 ms
AsStream 28.1 ms
Iteration test ID order 35.2 ms
Iteration test physical order 34.2 ms
Iteration speed 0.1 ms
ID[] speed 79.8 ms
GetAllIDs physical order 0.1 ms
GetAllIDs ID order 0.1 ms
GetAllPhysicalIndexes 0.0 ms
Close and Open 10.2 ms
Verify 54.4 ms
Deleting 55 elements 0.0 ms
Iteration test 36.9 ms
Close 9.5 ms
Open 0.2 ms
Verify 53.3 ms
Updating 91 elements 2.0 ms
Verify 44.9 ms
AsStream 30.1 ms
Iteration test ID order 37.1 ms
Iteration test physical order 35.8 ms
Iteration speed 0.1 ms
ID[] speed 79.6 ms
GetAllIDs physical order 0.1 ms
GetAllIDs ID order 0.1 ms
GetAllPhysicalIndexes 0.0 ms
Packing 74.7 ms
Iteration test physical order 46.2 ms
Iteration test ID order 36.4 ms
Iteration speed 0.1 ms
ID[] speed 0.0 ms
Values: 86.4 MB, file size: 86.5 MB 8.0 ms
Creating a GUID-indexed TSynBigTableString with 21 * 4MB elements 60.6 ms
Verify 26.9 ms
Close 55.3 ms
Open 0.2 ms
Verify 92.7 ms
Delete 0.0 ms
Close and Open 8.6 ms
Pack 979.2 ms
Add one & UpdateToFile 22.5 ms
Verify 91.9 ms
Clear 18.0 ms
Values: 88.0 MB, file size: 88.0 MB
Creating a string-indexed TSynBigTableString with 3750 elements 104.7 ms
Verify 76.6 ms
Close and Open 8.6 ms
Verify 76.9 ms
Verify Random 71.6 ms
Adding 1000 elements 13.1 ms
Verify 82.0 ms
Deleting 95 elements 0.4 ms
Iteration test 30.6 ms
Packing 77.5 ms
Verify 95.7 ms
Updating 62 elements 1.8 ms
Verify 44.4 ms
AsStream 29.0 ms
Iteration test ID order 35.4 ms
Iteration test physical order 35.2 ms
Iteration speed 0.1 ms
ID[] speed 0.0 ms
GetAllIDs physical order 0.1 ms
GetAllIDs ID order 0.2 ms
GetAllPhysicalIndexes 0.0 ms
Close and Open 10.7 ms
Deleting 14 elements 0.0 ms
Iteration test 44.7 ms
Close 9.8 ms
Open 0.4 ms
Verify 92.5 ms
Updating 70 elements 1.6 ms
Verify 43.6 ms
AsStream 30.5 ms
Iteration test ID order 37.4 ms
Iteration test physical order 38.0 ms
Iteration speed 0.2 ms
ID[] speed 0.0 ms
GetAllIDs physical order 0.1 ms
GetAllIDs ID order 0.2 ms
GetAllPhysicalIndexes 0.0 ms
Values: 90.1 MB, file size: 90.1 MB
Tests OK :)
You've read well. It's no trick. This benchmark was run on my laptop (under Windows Seven, with a standard Core i7).
The class add 1,000,000 records in 795 ms, with 8 byte main data, and two meta data fields: one 8 chars text and one integer field. Both fields have an index for immediate retrieval from any data!
Try to find a faster database engine!!!!
Offline