You are not logged in.
Pages: 1
Hello,
I have a well defined structured relatively big (195MiB) json file to parse. Using Delphi System.JSON unit it will likely crash on Win32 exe because average single structure needs around 5KB in memory. The file has more than 750.000 records and slowly increasing.
I could find about 5 threads in forum about "SAX". I could not find anything in documents. I believe (correct me if I am wrong) mORMot and very much likely mORMot2 has SAX json parsing.
I do appreciate an example on how to use SAX parsing, please.
BTW, I prefer mORMot2.
Thanks & Regards,
Ertan
Offline
Just try to parse that file and check if you get expected results, you can use TDocVariantData.InitJsonFromFile for example.
Offline
I've never used TDocVariantData and I cannot find my way thru. Below is my current test code, but even if it is an json of array, code thinks otherwise and exits at Doc.IsArray check.
uses
mormot.core.variants, mormot.core.data;
procedure TForm1.Button1Click(Sender: TObject);
var
Doc: TDocVariantData;
I, TempInteger: Integer;
TempString: string;
begin
Doc.InitJsonFromFile('users.json', [dvoInternNames, dvoInternValues]);
if not Doc.IsArray then Exit();
repeat
TempString := Doc.GetValueOrEmpty('Identifier');
TempInteger := string('0' + Doc.GetValueOrEmpty('AppType')).ToInteger();
until (not Doc.Items.MoveNext);
end;
Can someone put me in right direction, please?
Thanks & Regards,
Ertan
Offline
Can someone put me in right direction, please?
Are you sure that the JSON file is found (specify full path)? Check the return value of the function InitJsonFromFile. If True, then look at the value of the property Kind. For the DocVariantOptions you can take JSON_[mFastExtendedIntern]/JSON_FAST_EXTENDEDINTERN. If all this doesn't help, show a little bit of the JSON.
With best regards
Thomas
Offline
Hello,
Problem was users.json file has BOM and so cannot be read by TDocVariantData.InitJsonFromFile() If I read it using TFile.ReadAllText('users.json', TEncoding.UTF8) it is just fine. However, reading file content in a text variable and then using it to Init TDocVariantData feels like additional memory usage especially on a considerably large file.
This file is received as ZIP format from a web service and I cannot change the way it is saved. Is there any option for TDocVariantData.InitJsonFromFile() that can possible make it work with files including BOM?
Thanks & Regards,
Ertan
Offline
Try TDocVariantData.InitJSONInPlace(pointer(AnyTextFileToRawUtf8(FilePath)));
Offline
Try TDocVariantData.InitJSONInPlace(AnyTextFileToRawUtf8(FilePath));
I could not make compiler happy. InitJSONInPlace require PUTF8Char and AnyTextFileToRawUtf8 returns RawUTF8. However, compiler says
[dcc32 Error] Unit1.pas(44): E2010 Incompatible types: 'PUtf8Char' and 'UTF8String'
I tried several other conversion and explicitly using pointers. But, all of them lead to a compiler error.
Offline
My mistake again, I assumed InitJSONInPlace returns Boolean, but it does not.
if Doc.InitJSONInPlace(Pointer(AnyTextFileToRawUtf8('users.json')), [dvoInternNames, dvoInternValues]) = nil then
begin
ShowMessage('Json file could not be loaded');
Exit();
end;
Now, I will go back to parsing and measure memory usage on my test data.
Thanks.
Offline
Three remarks:
1. I have changed the BOM support in mORMot 2, and now you can use Doc.InitJsonFromFile() with a BOM file.
Even if BOM is not a good idea for UTF-8...
InitJsonInPlace() is indeed low-level.
2. I guess you can use dvoInternNames safely, but only dvoInternValues if most JSON values are likely to reappear in the input JSON.
3. If TDocVariant has troubles with the JSON, and if the structure is really simple, I would rather try to use a dynamic array of records for the storage.
It would be more efficient in terms of memory, and also performance. And using a dynamic array and records would be much easier to work with.
Online
1. I have changed the BOM support in mORMot 2, and now you can use Doc.InitJsonFromFile() with a BOM file.
Even if BOM is not a good idea for UTF-8...
InitJsonInPlace() is indeed low-level.
I do really appreciate that. Thank you. I also think BOM is not a good idea, but you eventually find it out of your control here and there.
2. I guess you can use dvoInternNames safely, but only dvoInternValues if most JSON values are likely to reappear in the input JSON.
Number values indeed repeat a lot like 1 and 2 only and that is for multiple different values.
3. If TDocVariant has troubles with the JSON, and if the structure is really simple, I would rather try to use a dynamic array of records for the storage.
It would be more efficient in terms of memory, and also performance. And using a dynamic array and records would be much easier to work with.
That file immediately goes into database for each item read. I will not keep its data in memory for any other purpose. So far TDocVariant seems to work just fine.
Offline
TDocVariant loads everything into memory, before you can iterate into its values.
Yes, I am going to make measurements. Getting ready for it.
I count on dvoInternNames, dvoInternValues parameters. I will see if it will be enough to keep memory very low as names identically repeat and there are values that repeat a lot.
Last edited by ertank (2022-11-28 13:30:21)
Offline
After calculations, I get 1.92KB per item average memory usage. It was around 5KB before.
That would suffice to continue with Win32 application.
Since increase is slow, I will figure an alternative solution in the future.
Thanks.
Offline
Pages: 1