You are not logged in.
Pages: 1
Thanks both!, I knew about indexing later, looked through the examples and other tests, and it works that way, which somehow has sense (you already know this if you have coded a b-tree).
My idea is importing many files, some of them are multigigabyte others a few Kbs, I don't want dupe games, so I need the "gamedata" to be Unique.
Then, I need the index to be created before, so I can't create an index later?
Gamedata is a string of text, I first tried to hash this text with sha1 for example and then unique this sha1 field, but I found no improvement in this (sqlite does the job very good), so I revert to just plain insert the string with the field "uniqued"
I did many tests last year about inserting and finally found batch inserting as mormot does was faster than inserting row by row inside a transaction.
But yesterday I read about OTA importing in a post by AB.
Still a draft
I am thinking I am going to go back to my first attempt, simple inserts inside a transaction, maybe talking directly to the SQLite engine (I already have code for that).
I want later to report many types of stats derived from the DB, so I'll need a lot of selects with grouping, counting, etc...
Of course, there are thousands of ways of doing a chess game collection, by hand (some python scripting) or with already very good chess databases (TarraschChess, for example, is open source, and the author has explained a lot about the decisions and code changes in his blog, very interesting).
Sorry if this thing is pretty basic, but I'm just a guy doing some experiments and trying to learn a bit. Thanks!
BTW, I tested a lot the fastest way to read text files, I finally did my own readln(), faster than other methods, I can't assure is bug free but I did a lot of testing and it worked. Basically, is a simple function that reads a block of bytes from the file, scans for #10 and returns a string, the key point here is how you work with the last (possibly) non-complete string in your buffer, you have to read another chunk and fix that. It could be speeded up, using vectorized scanning and other stuff like that, but it's totally out of my knowledge. Anyway, faster that streams, stringlists, memory mapped files, readln with bigger settextbuff, etc...
Hi,
Sometime ago I learn here the best way to batch insert millions of rows (thanks ab),
First I was doing classic inserts, I wanted to build a chess database (of games), but after some learning and asking,
I finally did them in a batch, as you can see in the examples:
pgndb:=TSQLRestServerDB.CreateWithOwnModel([TSQLpgnGame],DATABASE_FILE_NAME);
try
pgndb.DB.LockingMode:=lmExclusive;
pgndb.DB.Synchronous:=smOff;
pgndb.CreateMissingTables;
pgnbatch:=TSQLRestBatch.Create(pgndb,TSQLpgnGame,INSERTS_PER_COMMIT);
try
pgnrec:=TSQLPgnGame.Create;
try
//loop the batch here
// do some stuff, parse the PGN file, build record
pgnbatch.Add(pgnrec,true);
pgnrec.clearproperties;
//end loop
finally
pgnrec.Free;
end;
pgndb.BatchSend(pgnbatch);
finally
pgnbatch.Free;
end;
finally
pgndb.Free;
Writeln('Memory used finally = ', IntToThousandString(MemoryUsed));
...
Now, I'm trying to learn a bit more the right way to do stuff with Mormot.
As I want games to be unique in the DB, I created a index
pgndb.CreateMissingTables;
pgndb.CreateSQLIndex(TSQLgame,'Gamedata',true); //true=unique yes
But... I was thinking, and sorry because I learned DB and SQL in the old school way, when you did everything by hand with SQL ...
How dupes work in this way of inserting? I mean, imagine I am inserting a batch of 1000 games, and some of them are already in the DB (only unique-ing here by gamedata, that is, the game in itself, not players, tournament or whatever).
When I was doing classic inserts, I got an exception when I try to insert a dupe row (by index I mean), so I could count them and process them apart.
I examined the examples and did some google searches but found nothing. Thanks!
Thanks!
is this expected?
procedure TMD5.Update(const buffer; len: Cardinal);
and the rest of update functions in the unit uses
procedure THMAC_SHA1.Update(msg: pointer; msglen: integer);
procedure TSHA1.Update(Buffer: pointer; Len: integer);
procedure TSHA256.Update(Buffer: pointer; Len: integer);
Sorry if my question is stupid
>2018-09-18 06:22:40
>Good news:
>https://www.sqlite.org/windowfunctions.html
>2018-09-18 06:25:28
>And mORMot now includes SQLite 3.25.0: https://github.com/synopse/mORMot/commi … 53bf38721d
man, only 2m48s later?
you are so slow... xD
Ok, thanks for the information, I need the "classic" crc32, so I think it will be easier to use my own crc32 then, Indy uses tfilestream. Maybe I could use the FileOpenSequentialRead as you used in hashfile, and your implementation of classic crc32.
Thanks again for such great library.
I am hashing files, not related to crypto. As I need to compare files with already calculated hashes in a text file (kind of .sfv files).
I just need CRC32, SHA1 (and possibly MD5).
I was reading syncommons because I found tons of crc32 use there, but will look in synzip and see if I can add it to Tsynhasher.
Thanks.
Hi!
I am doing some hashing here, so I found the very nice function HashFile.
It seems TSynHasher can't do CRC32, as I need to use it, I want to ask as I've seen in sources, if I want to add CRC32 support to tSynHasher, all I need to do
is add the algo in THashAlgos, and update the functions in TSynHasher, Init, Update and Final ?
It seems easy, but cannot find as can I call the CRC32 init, update, etc... functions (in Syncommons)
I am currently using Indy10 hashing functions, but I found them slow, I want to do some benchmark here.
As I am coding a little tool to calc CRC32, MD5 and SHA1 hashes of files and adding them to an array, this is 99% of cpu, so I want to be as fast as possible.
In my tests, a basic 400MB set of files, with Ind10 I got around 21s total time, and just changing the SHA1 calculation to Mormot is reduced to 7s!
Possibly using CRC32 with Mormot I can get it faster (even when i know SHA1 is much more complex and so the savings)
Thanks!
Yes, i've seen that. Ok, then it's much more complex than I thought.
Thanks!
Excuse my ignorance, this need to be implemented at level source in sqlite? Or could be implemented in synopse?
I've seen other solutions for on-the-fly sqlite compression.
As i seen mORMot supports encryption, maybe is easy to implement this in the code?
I know there is very fast compression/decompression libraries out there, lz4 for example.
Any ideas about this?. Sorry, if my question is dumb, i am just a coder doing little personal projects in Delphi/mORMot.
Thanks AB for such a great work!
Thanks for answer, and for answering newbie questions! Sometimes is hard to learn new things.
You were right (and my test was bad).
batch.reset is necessary to free memory after a batchsend, if i do the combo, memory is stable at aroud 40/50mb independent of number of lines inserted (i tried with a 8gb text file), if i just use the batchsend, memory grows until it overflow.
I am curious, in the line batch := TSQLRestBatch.Create(db,TSQLstest,1000), what is the use of AutomaticTransactionPerRow (1000 here), is data commited?, if so, is safe on disk and memory could be free?
Following a tip from master ab, I started trying to improve my insertion speed in a test I am making here.
First I started doing the typical Inserts inside a transaction. But finally using ORM with batched inserts.
So, i got a 20 million lines text file, and made a simple test, get line, insert in a simple test table. But started getting out of memory exceptions.
I've used this little function to test used memory
function MemoryUsed: cardinal;
var
st: TMemoryManagerState;
sb: TSmallBlockTypeState;
begin
GetMemoryManagerState(st);
result := st.TotalAllocatedMediumBlockSize + st.TotalAllocatedLargeBlockSize;
for sb in st.SmallBlockTypeStates do begin
result := result + sb.UseableBlockSize * sb.AllocatedBlockCount;
end;
end;
Ran it again and watched memory going up until it hit 2GB, exception and stopping working.
So I thought that I need to do "db.BatchSend(batch)" every x lines to free the memory used by the Batch process. I am bulding a 32bit executable,
but I think I am doing something bad, because 2GB must be enough for this simple test.
I know the AutomaticTransactionPerRow, so this commiting of mine is superflous?
I am following the ORM batch example: (... = omitted non interesting code) All strings are rawutf8.
db := TSQLRestServerDB.CreateWithOwnModel([TSQLstest],'c:\test.db3');
...
db.DB.LockingMode := lmExclusive;
db.DB.Synchronous := smOff;
db.CreateMissingTables;
...
batch := TSQLRestBatch.Create(db,TSQLstest,1000);
stest := TSQLstest.Create; //class(TSQLRecord)
...
try
repeat
readln(tfile,s1);
inc(cline);
stest.st:=s1;
batch.add(stest,true);
if (cline mod COMMIT_COUNT=0) then
begin
db.BatchSend(batch);
Writeln(cLine);
Writeln('Memory used intracommits = ', IntToThousandString(MemoryUsed));
end;
until eof(tfile);
finally
stest.free;
end;
db.BatchSend(batch); //the rest of pending batch adds
db.Free;
Very simple stuff, but memory is not free, and keeps growing until it breaks.
I am doing something wrong? do i need to free the batch after sending it, and re-create them? Or use a reset after send?
Edit: tested reseting the batch after a send but memory keeps growing. So Reset don't solved this.
First I thought i was leaking memory. So I've used MadExcept for testing memory leaks and other errors. But this test is not leaking memory, all is green with MadExcept,
but just one case in Mormot initialization (minor one). But not leaking for my part.
If needed i can put here the whole test (a console little program)
Thanks !!!
Edit: has anyone ran the Synopse test cases with something as EurekaLog / madExcept / Fastmm4 with all error detection On?
Thanks! I've been reading the examples, and see exactly what you say, ORM+Batch. It's more complex at first sight to me, thanks! Sometimes it's hard to learn new paradigms...
I am very happy with this library it seems very well rounded in all corner, thanks for all the good work ab!
Edit:
well it seems bind, execute and reset don't work for me, ("tsqlrequest.Reset called with no previous Request", error) ... but bind, step and reset works perfectly.
Anyway, i'll forge accesing directly the library and will try ORM+Batch
Thanks!
Edit 2: I know all you already have seen many benchmarks of Mormot. So this is just mine testing pretty easy inserts.
Compiled in Berlin 10.1, executed in a i5, 8GB, SSD Samsung EVO
Transactioned, basic sqlite3 1.000.000 rows inserts using Tsqlrequest, with prepared insert
- 12,3segs - 80.932 rows inserted per second
TSQLRestServerDB+TSQLRestBatch same 1.000.000 rows inserted
- 8,50seg prepared + 2,14seg inserts - total 10,54seg - 94.789 rows inserted per second
Black magic!
I want to test basic Sqlite3 usage here, i want to show the lib to some coworkers.
I'll do some inserts, show table name, then a general select, and then show first 10 rows of select. Pretty basic stuff to start with this lib.
This is inside a basic console app,
const
CREATE_SQL ='create table testtable (id integer primary key, randomst text, randomint integer);';
CREATE_SQL_IDX = 'create unique index idx_st on testtable (randomst asc);';
INSERT_SQL ='insert into testtable values (null, ?, ?);';
SELECT_SQL ='select * from testtable;';
DB_FILE_PATH = 'd:\00\';
DB_FILE_NAME = 'sqltest.db';
DB_FILE = DB_FILE_PATH+DB_FILE_NAME;
var
s1,s2:rawutf8;
c1,c2:longint;
sqldb:tsqldatabase;
req:tsqlrequest;
tableNames: TRawUTF8DynArray;
nrows:ptrint;
...
//create table / file
sqldb:=TSQLDatabase.Create(DB_FILE);
sqldb.Synchronous:=smOff;
sqldb.LockingMode:=lmExclusive;
sqldb.MemoryMappedMB:=128;
sqldb.usecache:=true;
Writeln('Running on SQLite3 version ',sqlite3.ClassName,' ',sqlite3.Version);
sqldb.Execute(CREATE_SQL);
sqldb.Execute(CREATE_SQL_IDX);
sqldb.GetTableNames(tableNames);
writeln('Tables found in db: ',length(tableNames));
for c1 := 0 to length(tableNames)-1 do
writeln(tablenames[c1]);
req.Prepare(sqldb.DB,INSERT_SQL); //prepare insert
//insert 1000 random rows
sqldb.TransactionBegin;
for c1 := 1 to 1000 do
begin
req.bind(1,genrandomstring(50));
req.bind(2,random(MaxInt));
req.Step;
req.Reset;
end;
sqldb.Commit;
req.close;
//select all rows
s1:=sqldb.ExecuteJSON(SELECT_SQL,false,@nrows);
writeln(nrows,' rows returned');
FreeAndNil(sqldb);
writeln('End testing. Press enter to exit');
What i am thinking is if this is the right way to make inserts inside a transaction, with a prepared insert.
req.Prepare(sqldb.DB,INSERT_SQL); //prepare insert
sqldb.TransactionBegin;
for c1 := 1 to 1000 do
begin
req.bind(1,genrandomstring(50));
req.bind(2,random(MaxInt));
req.Step;
req.Reset;
end;
sqldb.Commit;
req.close;
The req.Reset seems necessary, if not i got a SQLITE_MISUSE(21) error.
First i tried this:
req.Prepare(sqldb.DB,INSERT_SQL); //prepare insert
sqldb.TransactionBegin;
for c1 := 1 to 1000 do
begin
req.bind(1,genrandomstring(50));
req.bind(2,random(MaxInt));
req.Execute;
end;
sqldb.Commit;
req.close;
But got a SQL error also. So i was thinking which is the right way to do basic inserts with the sqlite3 wrapper.
Prepare->bind->step->reset->...repeat the bind->...etc ??
Thanks
Is there a list of open source software created with mORMot that I can be used to study how mORMot works in practice?
This could be great.
Digging for examples in the mormot test cases is very complicated for me, not that i am saying it's bad coded, the contrary!. But not coded as examples and not easy to understand.
hi, i today loaed some old code and remember i used synscalemm years ago.
Is this still any advantage? I am using Delphi Berlin 10.1 right now. I think not use today?
thanks for the tips admin!
i'll keep on posting in this thread for other people interested in starting simple sqlite3 use in delphi
hi friends,
i am lately playing a bit with SQLite3 as base for a chess project, i am trying to clean and de-dupe all my pgn library. PGN files are text files with description of chess games. So i think creating a SQLite3 DB will be a good solution in the first step. I got some GBs of PGN with nearly ~10-15? million games, and i suspect that after deduping i'll get much less, maybe 8-9? Let's see...
well, i tried first with DiSqlite but i got some weird errors (i am using Delphi 7 in my Windows8 machine) and start looking information, so, through some stackoverflow threads i am finally here, discorvering this project...
As the text files are huge i need to be *fast* so, i don't want overheads, just pure speed, and that's is a thing that mORMot excels!
I've read i can left out all the ORM overhead and just use the SQlite3 fast access, that's what i am looking for. i've read the pdf file (the 1.8) version, and read the examples, but nearly all the information is abou using full mORMot, servers, json, complex classes, etc...
So i am a bit lost, because what i need is much much simpler, a chess games just are some text strings (players, elo, dates, etc...) and the game in-self (also a string), i only need
- test if local file db exists, if not create it, using parameters in the SQLite3 to be as fast as possible (no locking, etc, i've already read all of this)
- process a pgn file (this can be a 1MB file or a 4GB file), i've already implemented this
- BATCH/bulk insert chess games in the file, because for being VERY FAST some kind of batch/bulk inserts is needed, some useful information i've read in these urls
http://blog.quibb.org/2010/08/fast-bulk … to-sqlite/ and http://stackoverflow.com/questions/1209 … rts-sqlite
- of course, if a chess game is already in the file is not inserted, i'll use uniques in the game data (this must be tested because the length of the string...)
- also i have to study if it's interesting to store the chess game in compressed data, i've already tested this with DiUCL library, but found you ALSO provide a SynLZ unit (FANTASTIC WORK!!!)
- reaching the end of the pgn file, close all files, and output data
- repeat with all the information, and finally i'll get a base sqlite3 file with millions of chess games,
- maybe build a interface over this file to make fast searchs, per position, per player, etc...
so...
- open/create file db
- bulk inserts
- close file
in the future
- open db
- provide some db maintaning through sql or direct api (delete records, search, etc)
thanks for reading friends!
Pages: 1