In fact, the propagation/replication algorithm we detailed in this forum thread may be seen as CRDT.
By ensuring the order of modifications is preserved, we may have a consistent duplication.
Now I will let some time to think about it....
But yes we should refactor this part for proper data replication.
]]>function TSQLRestStorageExternal.EngineLockedNextID: Integer; {virtual;}
Please, notice that fEngineLockedLastID is assigned a value in 3 more places. In the following method it is assumed that the generated ID values will be adjacent, which may not be the case and strongly depends on the EngineLockedNextID implementation:
procedure TSQLRestStorageExternal.InternalBatchStop;
...
finally
if (fBatchMethod=mPost) and (fBatchCount>1) then
// -1 since fBatchFirstAddedID := EngineLockedNextID did already a +1
inc(fEngineLockedLastID,fBatchCount-1);
...
Two questions:
Isn't it appropriate to introduce the method EngineLockedNextID earlier in TSQLRest in order to create a seam for a
customizable ID generator?
Can we have a dedicated interface (abstract class) for such a customizable generator?
Regards,
]]>What do you mean exactly by "Was the mORMot uses RDBMS DRI peculiarities to support the persistence?"? (there should be a missing word)
Quite possible! Sorry, English is not my native language.
What I meant was that, I'll be surprised to see mORMot trying to use other than just simple DML expressions by several reasons:
Multiple DB back-ends, each with its own language (and semantic) peculiarities;
Entanglement with a patterns considered as not suitable;
Introduction of a hidden behavior, which cannot be followed at a higher level;
Usage of NoSQL engines lacking most of the properties of the classic RDBMS;
My confusion is about implementing the same features (e.g. FK constraints) on a higher level which is IMHO following the same patterns.
]]>What do you mean exactly by "Was the mORMot uses RDBMS DRI peculiarities to support the persistence?"? (there should be a missing word)
Do you mean that mORMot is supposed to generate DRI SQL statements for each RDBMS it runs on?
Yes, it is.
Theoretically.
BUT it would have some drawbacks:
1. If we rely on the RDBMS triggers for DRI, some modifications (SET DEFAULT or CASCADE) won't be monitored by the ORM.
Some features (like change tracking audit trail, or our upcoming data replication) would not work any more.
2. And in mORMot, we may have tables in one database, with DRI with tables on other databases!
It is not possible to handle with regular DRI statements, which are tied to one DB instance (unless you install some complex transactional engines, which is a nightmare to manage from my experiment).
There are so many potential combinations and use cases, that mixing ORM with SQL was difficult to make.
So we decided, in the first implementation, to handle it at ORM pure level.
We tried to keep it simple, and furthermore testable.
But, of course, we are open to any improvement!
We could eventually let the DB handle DRI constraints, to increase performances, if both tables are on the same DB.
Trans-table ORM for external DBs is somewhat limited now: we rely on SQLite3 virtual tables e.g. for JOINed queries, which are working, but not perfect either.
If you have any idea, feel free to propose your change.
You can optionally specify the associated table, using a custom TID type for the published property definition. In this case, you would sub-class TID, using tableNameID as naming convention.
For instance, if you define:
type
TSQLRecordClientID = type TID;TSQLOrder = class(TSQLRecord)
...
published Client: TID read fClient write fClient;
published OrderedBy: TSQLRecordClientID read fOrderedBy write fOrderedBy;
...
Those two published fields would be able to store a Int64 foreign key, and will both have one index created on the database. But their type (TID or TSQLRecordClientID) will make a difference about the deletion process.
By using the generic TID type, the first Client property won't have any reference to any table, so an index will be created for this column, but no deletion tracking would take place.
On the other hand, following the type naming convention, the other OrderedBy property will be associated with the TSQLRecordClient table of the data model. In fact, the ORM will retrieve the 'TSQLRecordClientID' type name, and search for a TSQLRecord class name matching TSQLRecordClassNameID, which is TSQLRecordClient in this case.
As a result, the ORM will create an index for the column, and track any TSQLRecordClient deletion: it will ensure that this OrderedBy property will be reset to 0 for any row pointing to the deleted record. As with TSQLRecord or TRecordReference properties, the ORM won't define a ON DELETE SET DEFAULT foreign key via SQL, but implement it at ORM level.
How do you make the purge of the journal?
For the externally imported changes it suffice to leave just the rows with max(donorkey) for each <tablename, donorsid> pair. For the local changes purging must be done according to the most lagging external site, but again, one row must be left for each <tablename, donorsid=localsid>. Now, the assumption is that no one will lag more than 30 days, since no information is recorded about the lag.
If I understand correctly, any issue due to potential clock de-synchronization is fixed by the order on which you apply the journal events?
It should be.
Since the site id is part of the record id, why do you maintain separate donorid/siteid columns? Only for performance reasons?
That was in case I decide to change the scheme of generating keys (set @out_key = @key * 10000 + @sid); Besides, I was not quite sure how the arithmetic will impact on the index processing.
I wanted everything to work in the first version, and as we all know "Premature optimization is the root of all evil" . So I stuck just to the bare minimum. Never optimized it since then.
BTW when I got rid of the MS subscriptions/publications, I got so much free room that I decided even not to normalize the journal (tablename). Actually there is lot to be improved, e.g. the local row deletion makes obsolete all changes made before to the same row (but may impact DRI), etc. but, do I really need it?
]]>How do you make the purge of the journal?
If I understand correctly, any issue due to potential clock de-synchronization is fixed by the order on which you apply the journal events?
Since the site id is part of the record id, why do you maintain separate donorid/siteid columns? Only for performance reasons?
]]>A small excerpt from the journal:
key siteid tablename recordkey op stamp donorkey donorsid
------------- ------- ---------- ------------- -- ----------------------- ------------ --------
52424390017 17 Subscript 52424380017 I 2014-01-27 10:32:33.187 52424390017 17
52424470017 17 Subscript 52424380017 U 2014-01-27 10:32:37.497 52424470017 17
52424490017 17 Subscript 52424480017 I 2014-01-27 10:32:43.420 52424490017 17
52424530017 17 Subscript 52424480017 U 2014-01-27 10:32:49.280 52424530017 17
Here key is the PK of the table, siteid is the identifier of the site where the CRUD operation occurred, tablename, recordkey are the row address where occured, stamp is the timestamp, donorkey is the key of the foreign journal table from where the change was imported, donorsid is the identifier of the site from where it was imported.
(each site has unique small number for site id, 'donor' is the foreign site)
On local updates donorkey equals key, donorsid equals siteid.
And the sync goes like this:
// check in our journal for last update from that site/table
SQL.Add( 'select coalesce(max(jlocal.donorkey),0)' );
SQL.Add( 'from journal jlocal ' );
SQL.Add( 'where donorsid = :dsid and tablename = ''' + tableName + '''' );
Params[0].Value := donorSiteID;
Open;
lastEntry := Fields[0].Value;
That is for retrieving the last update applied from the specified 'donor' for a given table.
And then:
SQL.Add( Format( 'select top %d', [recordsLimit]));
SQL.Add( ' j.key as "$$key", j.siteid as "$$siteid",');
SQL.Add( ' j.tablename as "$$table", j.recordkey as "$$rkey",');
SQL.Add( ' j.operation as "$$oper", j.stamp as "$$stamp",');
SQL.Add( ' j.donorkey as "$$donorkey", j.donorsid as "$$donorsid",');
// Here are the fields from the data row with the original field names
SQL.Add( ' l.*');
SQL.Add( 'from journal j left join "' + tableName + '" l on');
SQL.Add( ' j.recordkey = l.key');
SQL.Add( 'where');
SQL.Add( ' j.key > :lastentry and' ); // Newer journal entries
SQL.Add( ' j.donorsid <> :lsid and' ); // Not originating from here!
SQL.Add( ' j.tablename=''' + tableName + ''' and' ); // For requested table
SQL.Add( ' (j.operation = ''D'' or l.key is not null)');
// The keys are increasing so the order will be chronological
SQL.Add( 'order by j.key');
Params[0].Value := lastEntry;
Params[1].Value := localSiteID;
Open;
Now we have a dataset with the rows inserted, updated or deleted in the foreign table since the last sync. The reason to include 'D' operations is to process them in chronological order.
Next:
while not EOF and (recordsLimit > 0) do
begin
case Operation of
'I': InsertRec;
'U': UpdateRec;
'D': DeleteRec;
'R': CheckInRec; // Insert or update?
else
raise EDatabaseError.Create('Unknown operation!');
end;
...
// Copy the foreign journal entry just processed into the local journal
InsertJournalEntry;
...
The InsertJournalEntry goes like this:
procedure InsertJournalEntry;
begin
with qryCloning do
begin
SQL.Text := 'exec dbclone_insert_in_journal :key, :siteid, :table, :rkey, :oper, :stamp, :donorsid';
Params[0].Value := qryDonor.FieldByName('$$key'); // This value will be written in the donorkey column!
Params[1].Value := qryDonor.FieldByName('$$siteid');
Params[2].Value := qryDonor.FieldByName('$$table');
Params[3].Value := qryDonor.FieldByName('$$rkey');
Params[4].Value := qryDonor.FieldByName('$$oper');
Params[5].Value := qryDonor.FieldByName('$$stamp');
Params[6].Value := donorSiteID;
ExecSQL;
end;
end;
The only trick is the usage of donorkey and donorsid values. Used that way there is no need for a separate table for keeping information about each remote site and the corresponding sync progress. Also, it keeps track for the changes propagation among the databases.
Hope no one got bored!
Regards,
]]>I believe that mORMot have it's internal handling of the schema changes and may be the task of schema syncing is a lot simpler compared to a classic RDBMS. Mssr. Bouchez could express an opinion on that.
]]>Best regards.
]]>We have already in the framework a lot of components to implement it.
We implemented history tracking in a few code lines, some weeks ago - see http://synopse.info/files/html/Synopse% … ml#TITL_85
The REST transport layer is just perfect for the purpose. Previously restricted at TDS with no options;
The tracking can be implemented in Pascal, no need for T-SQL/P-SQL/x-SQL triggers and hacks for detecting replication sessions or filling default values on OID columns;
Engaging the change-sets into the memory/cache can greatly reduce the row/table locks and thus improve performance;
It will be very easy to find the table dependencies via RTTI and make the right master-detail tables sorting instantly. Now the order is hard-coded and it is computed (tool) from the DB metadata during the planning stage;
The OID's can be computed without a DB round-trip for each row.
Regarding the 4).
2.1. before Delphi 2010: we specify it as a specific type so that the table would be retrieved from the type name (TSQLRecordClientID -> TSQLRecordClient)
type TSQLRecordClientID = type(TID);
...
published OrderedBy: TSQLRecordClientID read fOrderedBy write fOrderedBy;
I think that should be the natural way.
Regarding the 5).
Consider my T-SQL OID generator:
CREATE procedure dbo.dbclone_genkey
@out_key bigint output,
@in_increment bigint = 1
as
begin
declare @sid bigint, @key bigint
set transaction isolation level repeatable read
set xact_abort on
begin transaction
select top 1 @sid = dbclone_id, @key = dbclone_genval
from dbclone_siteid order by dbclone_ord
update dbclone_siteid
set dbclone_genval = dbclone_genval + @in_increment
where dbclone_id = @sid
commit transaction
set @out_key = @key * 10000 + @sid;
end
Here we have @sid for a DB site identifier and @key for the next key to be returned, which must be modified and written back. This is our bottleneck. All CRU's must go through that procedure. The solution is to pre-allocate a range of ID's with the @in_increment parameter and to have just one round-trip for a block. This is not possible in pure T-SQL, but absolutely feasible at the mORMot OPF level.
I'll continue to feed you with the details, at least until you say it is enough .
And WOW! You're lightning fast with the changes!
]]>