#1 2022-11-27 06:33:32

ertank
Member
Registered: 2016-03-16
Posts: 163

SAX parsing json

Hello,

I have a well defined structured relatively big (195MiB) json file to parse. Using Delphi System.JSON unit it will likely crash on Win32 exe because average single structure needs around 5KB in memory. The file has more than 750.000 records and slowly increasing.

I could find about 5 threads in forum about "SAX". I could not find anything in documents. I believe (correct me if I am wrong) mORMot and very much likely mORMot2 has SAX json parsing.

I do appreciate an example on how to use SAX parsing, please.

BTW, I prefer mORMot2.

Thanks & Regards,
Ertan

Offline

#2 2022-11-27 10:39:23

igors233
Member
Registered: 2012-09-10
Posts: 234

Re: SAX parsing json

Just try to parse that file and check if you get expected results, you can use TDocVariantData.InitJsonFromFile for example.

Offline

#3 2022-11-27 16:40:07

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,231
Website

Re: SAX parsing json

Try with TDocVariantData with the interning options, to reduce memory consumption.

Offline

#4 2022-11-27 18:35:14

ertank
Member
Registered: 2016-03-16
Posts: 163

Re: SAX parsing json

I've never used TDocVariantData and I cannot find my way thru. Below is my current test code, but even if it is an json of array, code thinks otherwise and exits at Doc.IsArray check.

uses
  mormot.core.variants, mormot.core.data;

procedure TForm1.Button1Click(Sender: TObject);
var
  Doc: TDocVariantData;
  I, TempInteger: Integer;
  TempString: string;
begin
  Doc.InitJsonFromFile('users.json', [dvoInternNames, dvoInternValues]);
  if not Doc.IsArray then Exit();
  repeat
    TempString := Doc.GetValueOrEmpty('Identifier');
    TempInteger := string('0' + Doc.GetValueOrEmpty('AppType')).ToInteger();
  until (not Doc.Items.MoveNext);
end;

Can someone put me in right direction, please?

Thanks & Regards,
Ertan

Offline

#5 2022-11-27 21:54:00

tbo
Member
Registered: 2015-04-20
Posts: 335

Re: SAX parsing json

ertank wrote:

Can someone put me in right direction, please?

Are you sure that the JSON file is found (specify full path)? Check the return value of the function InitJsonFromFile. If True, then look at the value of the property Kind. For the DocVariantOptions you can take JSON_[mFastExtendedIntern]/JSON_FAST_EXTENDEDINTERN. If all this doesn't help, show a little bit of the JSON.

With best regards
Thomas

Offline

#6 2022-11-28 05:35:28

ertank
Member
Registered: 2016-03-16
Posts: 163

Re: SAX parsing json

Hello,

Problem was users.json file has BOM and so cannot be read by TDocVariantData.InitJsonFromFile() If I read it using TFile.ReadAllText('users.json', TEncoding.UTF8) it is just fine. However, reading file content in a text variable and then using it to Init TDocVariantData feels like additional memory usage especially on a considerably large file.

This file is received as ZIP format from a web service and I cannot change the way it is saved. Is there any option for TDocVariantData.InitJsonFromFile() that can possible make it work with files including BOM?

Thanks & Regards,
Ertan

Offline

#7 2022-11-28 06:27:05

igors233
Member
Registered: 2012-09-10
Posts: 234

Re: SAX parsing json

Try TDocVariantData.InitJSONInPlace(pointer(AnyTextFileToRawUtf8(FilePath)));

Offline

#8 2022-11-28 07:28:16

ertank
Member
Registered: 2016-03-16
Posts: 163

Re: SAX parsing json

igors233 wrote:

Try TDocVariantData.InitJSONInPlace(AnyTextFileToRawUtf8(FilePath));

I could not make compiler happy. InitJSONInPlace require PUTF8Char and AnyTextFileToRawUtf8 returns RawUTF8. However, compiler says

[dcc32 Error] Unit1.pas(44): E2010 Incompatible types: 'PUtf8Char' and 'UTF8String'

I tried several other conversion and explicitly using pointers. But, all of them lead to a compiler error.

Offline

#9 2022-11-28 07:32:51

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,231
Website

Re: SAX parsing json

TDocVariantData.InitJSONInPlace(pointer(AnyTextFileToRawUtf8(FilePath)));

Offline

#10 2022-11-28 07:37:53

ertank
Member
Registered: 2016-03-16
Posts: 163

Re: SAX parsing json

My mistake again, I assumed InitJSONInPlace returns Boolean, but it does not.

  if Doc.InitJSONInPlace(Pointer(AnyTextFileToRawUtf8('users.json')), [dvoInternNames, dvoInternValues]) = nil then
  begin
    ShowMessage('Json file could not be loaded');
    Exit();
  end;

Now, I will go back to parsing and measure memory usage on my test data.

Thanks.

Offline

#11 2022-11-28 10:48:31

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,231
Website

Re: SAX parsing json

Three remarks:

1. I have changed the BOM support in mORMot 2, and now you can use Doc.InitJsonFromFile() with a BOM file.
Even if BOM is not a good idea for UTF-8... smile
InitJsonInPlace() is indeed low-level.

2. I guess you can use dvoInternNames safely, but only dvoInternValues if most JSON values are likely to reappear in the input JSON.

3. If TDocVariant has troubles with the JSON, and if the structure is really simple, I would rather try to use a dynamic array of records for the storage.
It would be more efficient in terms of memory, and also performance. And using a dynamic array and records would be much easier to work with.

Offline

#12 2022-11-28 12:09:23

ertank
Member
Registered: 2016-03-16
Posts: 163

Re: SAX parsing json

ab wrote:

1. I have changed the BOM support in mORMot 2, and now you can use Doc.InitJsonFromFile() with a BOM file.
Even if BOM is not a good idea for UTF-8... smile
InitJsonInPlace() is indeed low-level.

I do really appreciate that. Thank you. I also think BOM is not a good idea, but you eventually find it out of your control here and there.

ab wrote:

2. I guess you can use dvoInternNames safely, but only dvoInternValues if most JSON values are likely to reappear in the input JSON.

Number values indeed repeat a lot like 1 and 2 only and that is for multiple different values.

ab wrote:

3. If TDocVariant has troubles with the JSON, and if the structure is really simple, I would rather try to use a dynamic array of records for the storage.
It would be more efficient in terms of memory, and also performance. And using a dynamic array and records would be much easier to work with.

That file immediately goes into database for each item read. I will not keep its data in memory for any other purpose. So far TDocVariant seems to work just fine.

Offline

#13 2022-11-28 13:18:55

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,231
Website

Re: SAX parsing json

TDocVariant loads everything into memory, before you can iterate into its values.

Offline

#14 2022-11-28 13:28:45

ertank
Member
Registered: 2016-03-16
Posts: 163

Re: SAX parsing json

ab wrote:

TDocVariant loads everything into memory, before you can iterate into its values.

Yes, I am going to make measurements. Getting ready for it.
I count on dvoInternNames, dvoInternValues parameters. I will see if it will be enough to keep memory very low as names identically repeat and there are values that repeat a lot.

Last edited by ertank (2022-11-28 13:30:21)

Offline

#15 2022-11-28 17:35:26

ertank
Member
Registered: 2016-03-16
Posts: 163

Re: SAX parsing json

After calculations, I get 1.92KB per item average memory usage. It was around 5KB before.
That would suffice to continue with Win32 application.
Since increase is slow, I will figure an alternative solution in the future.
Thanks.

Offline

Board footer

Powered by FluxBB