#1 2011-02-23 20:42:01

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,552
Website

Fast JSON (un)serialization

The core of our SQLite3 framework is using JSON for all its Client/Server communication, and also for the internal cache of previous SQL requests.

There is indeed a damn fast JSON generator and parser in the framework.

I've written some functions able to serialize and unserialize not only TSQLRecord, but any TPersistent class instance.

It will store any string property into UTF-8 (whatever the string type was in the class definition), and will store TDateTime as Iso8601 standard text.
It will also handle TCollection and TStrings properties, writing them as JSON arrays of respectively JSON objects or JSON strings.

For instance, the following code will create some objects in memory:

Coll := TCollTst.Create;
Coll.One.Name := 'test"\2';
Coll.One.Color := 1;
Coll.Coll.Add.Color := 10;
Coll.Coll.Add.Name := 'name';
JSONContent := ObjectToJSON(Coll);

The resulting JSON content is UTF-8 encoded, and can be formated in an human-readable layout:

{
    "One": 
    {
        "Color": 1,
        "Length": 0,
        "Name": "test\"\\2"
    },
    "Coll": 
    [
        {
            "Color": 10,
            "Length": 0,
            "Name": ""
        },
        {
            "Color": 0,
            "Length": 0,
            "Name": "name"
        }
    ]
}

The default normal layout is more compact, and store exactly the same data:

{"One":{"Color":1,"Length":0,"Name":"test\"\\2"},"Coll":[{"Color":10,"Length":0,"Name":""},{"Color":0,"Length":0,"Name":"name"}]}

Of course, you can reverse the conversion, and populate your Delphi class instance from JSON encoded data with just one line of code:

SyntaxErrorPointer := JSONToObject(TObject(Coll),@JSONContent[1],ErrorFlag);

In case of parsing error, ErrorFlag will be true, and SyntaxErrorPointer will point to the faulty part of the JSON content.

The JSON processing is very fast. Code is deeply optimized, and has no limitation but available memory.
In fact, the JSON parser unserialize the JSON in-place. Avoiding allocation of small memory blocks for each item make it much faster than other mechanism.
Even the properties access via RTTI was using some optimized asm functions of the framework, much faster than default Delphi RTTI.

See http://synopse.info/fossil/info/caf561aa07 and http://synopse.info/fossil/info/29c9fbbfeb

Offline

#2 2011-02-24 10:08:29

edwinsn
Member
Registered: 2010-07-02
Posts: 1,217

Re: Fast JSON (un)serialization

This is just great! So easy to use!


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

#3 2011-02-25 10:39:08

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,552
Website

Re: Fast JSON (un)serialization

I've added two small (but indeed mandatory) enhancments:
  *  manual handling of TSQLRecord.ID property serialization
  *  serialization of parent properties in  TJSONSerializer.WriteObject() and ObjectToJSON()

See http://synopse.info/fossil/info/56bccf9905

Offline

#4 2011-03-06 21:33:40

Julian
Member
Registered: 2011-03-06
Posts: 3

Re: Fast JSON (un)serialization

This looks great!

As JSON becoming more and more popular (and used) for so much things nowadays, it possible to move the JSON parsing/serialization code to a dedicated unit?

For example, I'm using JSON to store large number of objects into a SynBigTable. Some kind of key/value database, storing JSON documents. All works great, but i had to use an external JSON library.

And while we're on the topic, why not evolving SynBigTable to some kind of a document DB, similar to RavenDB (http://ravendb.net/).

Thank you for the great bits of code,
Julian

Offline

#5 2011-03-07 07:03:00

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,552
Website

Re: Fast JSON (un)serialization

Hi Julian,

AFAIK, documents are independent entities, unrelated to one another.
In big table, we first introduced key/value pairs, then metadata/record with fields and indexing.
A document DB could be implemented easily with the key/value storage.

Making a document DB will need a "generic" JSON parser and formater.
Our JSONToObject and ObjectTOJSON could make it.

1. Making this serialization stand alone

JSONToObject and OBjectToJSON are using our RTTI low-level classes.
This RTTI is linked to our SQLite3 framework.... so it could be difficult to make it stand-alone.
BUT if you don't use the ORM part of the unit, you exe size won't be including it, thanks to Delphi smart-linking.
So IMHO there is no mandatory requirement of making it stand alone.

2. How to implement a document DB with our good old big table?

A document DB has to implement indexes, in order to be efficient.
There is all the indexed stuff handling available in TSynTable.
In short: use ObjectToJSON/JSONToObject to serialize the classes, then create some virtual "fields" using TSynTable to create indexes.
My only concern is that TSynTable need to have direct and quick access to the indexed data, and a JSON document, with its fields which may be stored anywhere in the JSON content, is not a good candidate for direct/quick data access. I suspect we would need to store the indexed data in a separated array.

So perhaps a document DB (TSynBigTableDocument) could be implemented as such:
- extend the TSynBigTableMetaData to store the documents in plain JSON: in this case, you can add or retrieve plain Delphi object via ObjectToJSON and JSONToObject functions;
- you can add indexes to the document DB with creating meta data containing the indexed data (e.g. if you want to index a "name" field, creates a "name" metadata field, then populate with all "name" JSON field content).

3. Document storage

JSON is perhaps not the more efficient format for storage.
BSON could be used. Or perhaps our faster and more compact SBF format could be enhanced to handle field content. In order to be efficient, it could follow the Google's Protocol Buffers way of storage: i.e. it should contain all the field names in the metadata, and refers from the document to the field names with a simple index.
But at first, JSON should be enough. Our TSynBigTable engine won't suffer from the amount of data stored.

4. Additional features

The TSynBigTableDocument should include a simple mechanism of map/reduce.

This map/reduce mechanism is already handled in our TSynBigTable engine.
The TSynBigTableIterateEvent prototype is just that... we use it for fast content regeneration for a Pack or when the TSynBigTableRecord field layout changed.

What do you think about it?

Offline

#6 2011-03-11 17:40:18

Julian
Member
Registered: 2011-03-06
Posts: 3

Re: Fast JSON (un)serialization

Yeah, moving the JSON serializing in a stand alone unit is not a big deal - thanks to the smart-linking.

I'm planing to play with the ideas over the weekend, but first i'm planing to play a bit with the ObjectToJSON and JSONToObject functions.
Right now, ObjectToJSON can serialize TStrings, but it is ignoring the attached objects (if any). I was thinking to use the AddObject() of TStrings to mimic a dictionary or something like this.

Offline

#7 2011-03-11 17:43:53

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,552
Website

Re: Fast JSON (un)serialization

I'm about to enhance our ORM with support of records and dynamic arrays in TSQLRecord published properties.

A dynamic array of simple type (string, integer...) would be converted as such as a valid JSON array.

A dynamic array of a record could be converted as Base64 encoded binary data in the JSON field.

All those dynamic arrays would be stored as small binary BLOBs in SQLite3.

Thanks to the new TDynArray wrapper, added to SynCommons.
See http://synopse.info/forum/viewtopic.php?id=254

What I like in dynamic arrays is that they don't need a Create/Destroy to handle memory.

So stay tuned!


The only problem is that a a dynamic array of TSQLRecord would be only a dynamic array of pointers to TSQLRecord instances.
For handling an array of TSQLRecord, and have those converted into a JSON array, we would need to create some collections. But in this case, we would have to override the Create constructor, and Destroy to create the TCollection instance...
Not so easy.
In all cases, an array of TSQLRecord should be mapped as a One To Many relation, i.e. the detail table should have a TSQLRecord field pointing to the master table row.
This is the right way of doing it...

Offline

#8 2011-04-26 18:34:24

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,552
Website

Re: Fast JSON (un)serialization

That it: dynamic arrays, TPersistent, TCollection, TStrings and TRawUTF8List are handled natively by the framework as TSQLRecord published properties:
- dynamic arrays (stored as BLOB using our TDynArray efficient format);
- TPersistent published properties stored as TEXT, in JSON object format;
- TCollection published properties stored as TEXT, in JSON array of objects format;
- TRawUTF8List and TStrings published properties stored as TEXT, in JSON array of UTF-8 string format.

See http://synopse.info/forum/viewtopic.php?id=308

It use internal the JSON serialization we told above.

Offline

Board footer

Powered by FluxBB