#1 2024-03-06 22:52:53

HelloMog
Member
Registered: 2024-03-06
Posts: 3

Load a really big JSON

Hi,

I'm trying to write a simple loader for big JSON exports coming from a database (mongoDB), and the files are 8GB each, which excludes the JSON tools from FPC, as they all seem based on strings, which are limited to 2GB.

I used a few units from mORMot a long time ago on text parsing and remember it as really faster than anything else, so I'm quite happy to see a V2 with some JSON support.
Is there an easy way to open a big file, then parse it so I can transfer the data in an array of record (it's easier for me to load them like that, I'm from the Turbo Pascal days and not really good with the generics and other new features).
After a quick look at DynArrayLoadJson and JsonRetrieveStringField, I'm not really sure how to use them, and if they're fit to use on "general" JSON and not things coming from TJsonWriter.

Any help or documentation would be kindly appreciated,

Best regards,

Offline

#2 2024-03-06 23:53:24

HelloMog
Member
Registered: 2024-03-06
Posts: 3

Re: Load a really big JSON

(I tried TDocVariantData.InitJsonFromFile but with no success : AnyTextFileToRawUtf8 is doing a call to StringFromFile, which is hard-limited to 2GB despite being AnsiString)

Offline

#3 2024-03-07 03:28:36

Chaa
Member
Registered: 2011-03-26
Posts: 245

Re: Load a really big JSON

You can try TMemoryMap, and then JsonDecode/JsonArrayDecode.

Offline

#4 2024-03-07 08:43:50

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,240
Website

Re: Load a really big JSON

AnsiString is limited to 32-bit on Delphi (but not FPC), so we follow this limit.

JsonDecode() would parse the memory buffer inplace, adding #0 and unescaping JSON strings. So you can use it with TMemoryMap.
On 64-bit systems, you could load manually the JSON file into a huge memory buffer.

Or switch to a SAX approach. You have the low-level code in mormot.core.json to parse such JSON, but for 8GB it may be a bit tricky.

Offline

#5 2024-03-07 20:43:26

HelloMog
Member
Registered: 2024-03-06
Posts: 3

Re: Load a really big JSON

Thanks for your answers, Ab and Chaa !

After a whole day of trying and getting errors, I realized that there were 11 mistakes in the JSON from MongoDB, and even their Python lib crashed over their export files...
So I switched the exports to CSV, only to realise that even a TStringList.LoadFromFile doesn't like files > 2GB.
The solution I used in the end was TFileReader.ReadLine to split the file and load it in a TStringList.

I'm still very intrigued by mORMot 2 :-)

Offline

#6 2024-03-08 16:53:42

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,240
Website

Re: Load a really big JSON

If you use CSV, there is the DynArrayLoadCsv() function from mormot.core.search.pas which is able to load very efficiently a CSV into a dynamic array.
And consider using RawUtf8 instead of plain "string" in your dynamic array to reduce the memory usage.

You could use TMemoryMap to map the CSV file into memory, then call DynArrayLoadCsv() over the buffer (on Win64 of course).

If there are a lot of similar values within the CSV, you may even consider supplying an TRawUtf8Interning instance to DynArrayLoadCsv(), to reduce the memory usage even better.

Offline

Board footer

Powered by FluxBB