Data sharding doesnt work for us, what else can we imagine...?

h.hasenack · 2013-04-08 14:33:58

Hi Arnaud (and others)

We are createing a live somulation server, which has to load a myriad of objects (5M+) into a live model before the simulation can start.

Currently we have implemented this using a data sharding strategy, where the properties of our simulation objects are stored together in a blob field with a simulation object.
The thing is, that loading this involves decoding the blob into the attributes involves time consuming decoding and saving taks a lot of time as well.

The properties of these simulation objects are fully user definable, so we are not able the 'hardcode' them into our application, instead we have a "attributes definition" table that defines each attribute.

What Idea's can you come up with that would allow swift loading of objects?
* a "Live model" consists possibly of over 5M simulation objects
* Simulation objects may have anything between 1 and around 200 properties.
* Properties have a name, type and value off course.
* Types are often simple types (like boolean, integer,float, string) but may also be complex types (objects, set's etcetera)

We have some solutions in mind, but I'm not posting them yet in order to have a real "open" question.

Your idea's are very welcome.

Hans

ab · 2013-04-08 17:32:00

Do the property list of each object change from one object to the other?
That is, if a TSQLRecord could be able to have runtime-defined fields, with a field layout fixed for all records, should it do the trick for you?

This is because it is a model I've already implemented for our military intelligence software - http://roc-armees.info sorry it is in French, but you can take a look also at http://bouchez.info/roc.html - with success.
And there is a related item in the official mORMot roadmap - see this element:

Add customer-defined fields in mORMot ORM, via a new field type, stored as RawByteString (BLOB at the DB level), and using TSynTable / TSynTableData / TSynTableFieldProperties existing classes of SynCommons.pas; the UI units shall be able to handle edition of such fields just like regular TSQLRecord published properties, and a new unit shall provide customization of the record layout; this will allow end-users to customize the database layout according to their needs, whereas some default properties will be handled by the shared business logic (at least the ID field) - it would re-use the new TSQLPropInfo* classes for a runtime definition of the data, directly by the end customer.

Some SQlite3 virtual tables could be defined to supply direct any number of tables, from a parent TSQLRecord class.
That is, you may be able to publish some of the fields as true SQL columns.

As stated in the roadmap, TSynTable / TSynTableData / TSynTableFieldProperties existing classes of SynCommons.pas have already most of the internal binary "plumbing", and are kind fast.
Which other kind of fields do you need?
There are already some blob field included.
I want to add geographical fields, also (using SQlite3 RTree virtual table for indexed storage, if necessary).

I suspect performance is a need here.
Some other users of mORMot asked for high performance support of objects.
We may be able to by-pass/optimize the JSON marshaling for remote access of such objects.
But I suppose the main point is about server-side process (even more if you consider some kind of DDD architecture).

h.hasenack · 2013-04-09 07:20:37

Do the property list of each object change from one object to the other?

While many objects share the same set of attributes, yes, the attribute list may vary from object to object. SOme objects are powerlines, some are eg cars, or something completely different like a loan or maintenance scenario.Moreover we want to have "data inheritance" which alows one object to pre-define the properties of another object. This is kind-of like a template. Changing the template would (virtually) change all (data) derived objects in the live model.

That is, if a TSQLRecord could be able to have runtime-defined fields, with a field layout fixed for all records, should it do the trick for you?

I guess not

That is, you may be able to publish some of the fields as true SQL columns.

That is kind-of what I was think about. I was thinking separatye my objects (with values blob) table into a objects base table, and to have separate values tables for each 'value kind' or 'named attribute' with fixed fields for the value.

Currently I see of my model:
Advantages are: - easier selection (indexes) - faster loading (no blob decoding) - less storage (property name not stored with each value)
Disadvantages are: - rather complex data model - requires an extra 'attributes' definition table - no easy support for "one time" named values for a single value instance of a single object

ab · 2013-04-09 11:21:08

With the current implementation, you can override the following class method:

class procedure TSQLRecord.InternalRegisterCustomProperties(Props: TSQLRecordProperties);

It will allow you to add any column to the class, at run time.

In fact, this is a virtual method to be overriden to register some custom properties:
- Do nothing by default, but allow inherited classes to define some properties, by adding some TSQLPropInfo instances to Props.Fields list
- Can also be used to specify a custom text collation, by calling Props.SetCustomCollationForAllRawUTF8() or SetCustomCollation() methods.

It will update the database schema to add any new column, which won't be virtual, but "real", in the SQLite3 context.

But this method will work only if the field layout does not change from one data row to the other.

h.hasenack · 2013-04-09 13:03:49

But this method will work only if the field layout does not change from one data row to the other.

Unfortunately, this is what we actually need. So I'll have to store the values separately...

ab · 2013-04-09 17:19:35

You can use the binary encoding used by TSynTable for your purpose. Our SBF format can be very efficient.

Or why not use json here also, since it can define both property names and values?

Take a look at this feature request: http://synopse.info/fossil/tktview?name=cacaaf8f33

We may be able to make something generic at TSQLRecord level, if needed.
Perhaps as a dedicated (sponsored) development?

What do you think?

h.hasenack · 2013-04-10 06:42:03

It's nice, but still won't allow indexes (DB side) to be created for the properties as they are stored encoded in a blob field.

We currently load/import data from our old flagship app which has separate tables for the user definable property values. This loading is about 20 times faster than the loading from our 'native' data model with the values tored in a blob. (alike the request above)
Thats why I think we need a separate table for the values. whether it contains more than one value per row (in different columns) like in our old app, or one value per row is still to be decided.

ab · 2013-04-10 07:20:16

We can imagine an "evolving" table layout, creating true DB fields on request.
But I still do not understand how you may have "separate tables for the user definable property values" with a RDBMS layout with an index, and are able to change the defined properties at each row level.

If you want a many/to/many relationship with a pivot Key/Value table, you can already implement this, and define some custom methods and pseudo-properties at TSQLRecord level.
With this design, you are able to benefit of the RDBMS indexes on the pivot table.

h.hasenack · 2013-04-10 14:32:52

If you want a many/to/many relationship with a pivot Key/Value table, you can already implement this, and define some custom methods and pseudo-properties at TSQLRecord level.
With this design, you are able to benefit of the RDBMS indexes on the pivot table.

Yes, I believe this is the direction in which we will have to move. One -to-many in this case means one simulation object (=1 DB "simobject" record) has several properties (=n DB "simvalue" records). These simvalue records could then be distributed along several tables, depending on the data type. I think this will speed up loading tremenduously, and it also allows for finding the simobject ID rather quickly since for most simvalues, an index can be created on the value field.

mORMot Open Source

#1 2013-04-08 14:33:58

Data sharding doesnt work for us, what else can we imagine...?

#2 2013-04-08 17:32:00

Re: Data sharding doesnt work for us, what else can we imagine...?

#3 2013-04-09 07:20:37

Re: Data sharding doesnt work for us, what else can we imagine...?

#4 2013-04-09 11:21:08

Re: Data sharding doesnt work for us, what else can we imagine...?

#5 2013-04-09 13:03:49

Re: Data sharding doesnt work for us, what else can we imagine...?

#6 2013-04-09 17:19:35

Re: Data sharding doesnt work for us, what else can we imagine...?

#7 2013-04-10 06:42:03

Re: Data sharding doesnt work for us, what else can we imagine...?

#8 2013-04-10 07:20:16

Re: Data sharding doesnt work for us, what else can we imagine...?

#9 2013-04-10 14:32:52

Re: Data sharding doesnt work for us, what else can we imagine...?

Board footer