#1 2021-06-29 19:42:53

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,574
Website

Fast (very fast) IsValidUTF8 implementation

While profiling one of my app I found IsValidUTF8 function works slow for my use case ( big input data, up to 100Mb). Looking around I found a x50 times faster alternative based on vector instructions - algorithm description and production ready implementation as a part of simdjson library.

Hard to believe, but speed is near x50 times faster than SynCommons.IsValidUTF8 which is good optimized IMHO, and even fasted than a dummy loop (due to CPU branching for each loop circle condition I think)

 
Iterating 3588 times over 2991876 bytes string (~10Gb)...
Validate PAS  to TRUE 10 GB in  11.25s i.e. 909.4 MB/s
Validate SIMD to TRUE 10 GB in 271.06ms i.e. 36.8 GB/s
Dummy while loop      10 GB in  18.53s i.e. 552.2 MB/s
4 iter while loop     10 GB in  14.37s i.e. 712.3 MB/s

Disadvantages - linking to stdc++

Test project and C wrapper for C++ simdjson code is available on the GitHub: see simdjson_pas

Offline

#2 2021-07-03 03:09:44

edwinsn
Member
Registered: 2010-07-02
Posts: 1,218

Re: Fast (very fast) IsValidUTF8 implementation

I remember SIMDJson, it was hot on hacker news: https://hn.algolia.com/?q=simdjson

And this simple project of mpv, it's really a simple example for wrapping a C++ project into a C DLL then used by Pascal. Thanks for sharing!


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

#3 2021-07-24 11:40:23

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

I guess the "while loops" in your test programs are slow because they are written in the main begin..end. block so their variables are global variable, and never compiled as register.
I suppose if you use sub-function, the "while loop" would be faster.

Yes, simdjson is really impressive.
But it supports only strict JSON, whereas mORMot is able to understand MongoDB exceptions like unquoted field names (which is the default JSON layout between mORMot clients and server).

From TTestCoreProcess.JSONBenchmark mORMot 2 JSON process speed is pretty decent, for a pure pascal JSON parser - probably the fastest on Delphi/FPC:

  - JSON benchmark: 500,904 assertions passed  2.38s
     IsValidUtf8() in 77.25ms, 1.2 GB/s
     IsValidJson(RawUtf8) in 118.63ms, 826.3 MB/s
     IsValidJson(PUtf8Char) in 117.26ms, 835.9 MB/s
     JsonArrayCount(P) in 111.02ms, 882.9 MB/s
     JsonArrayCount(P,PMax) in 107.74ms, 909.8 MB/s
     JsonObjectPropCount() in 45.96ms, 1.2 GB/s
     TDocVariant in 661.96ms, 148 MB/s
     TDocVariant dvoInternNames in 805.23ms, 121.7 MB/s
     TOrmTableJson GetJsonValues in 22.94ms, 375.9 MB/s
     TOrmTableJson expanded in 38.82ms, 505 MB/s
     TOrmTableJson not expanded in 21.54ms, 400.3 MB/s

The TOrmTableJson parser is the one used by our ORM and reaches 500 MB/s which is pretty good in practice - especially for the not expanded mode which is much less verbose, so here the same data is read in 21ms instead of 38ms.
Also its JSON serializing abilities are good: more than 370 MB /s when writing via GetJsonValues() in non-expanded mode.

As reference, here are the same tests run with mORMot 1.18:

  - JSON benchmark: 100,203 assertions passed  635.21ms
     IsValidUtf8() in 20.94ms, 0.9 GB/s
     IsValidJson(RawUtf8) in 25.40ms, 771.7 MB/s
     IsValidJson(PUtf8Char) in 27.40ms, 715.3 MB/s
     JsonArrayCount(P) in 21.67ms, 904.7 MB/s
     JsonArrayCount(P,PMax) in 21.54ms, 910 MB/s
     JsonObjectPropCount() in 10.77ms, 1 GB/s
     TDocVariant in 171.43ms, 114.3 MB/s
     TDocVariant dvoInternNames in 229.24ms, 85.5 MB/s
     TSqlTableJson GetJsonValues in 26.60ms, 324.1 MB/s
     TSqlTableJson expanded in 46.30ms, 423.3 MB/s
     TSqlTableJson not expanded in 25.16ms, 342.6 MB/s

So I guess the work has been good with mORMot 2.

Offline

#4 2021-07-25 08:09:55

okoba
Member
Registered: 2019-09-29
Posts: 121

Re: Fast (very fast) IsValidUTF8 implementation

The validate_utf8 function is light and needs no memory using SIMD. More info: https://lemire.me/blog/2018/05/16/valid … -per-byte/
But as far as JSON needs, mORMot SAX approach uses no memory, but simdjson builds a DOM-like info that takes much (near 5X I think) more memory.

Offline

#5 2021-07-25 19:08:35

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,574
Website

Re: Fast (very fast) IsValidUTF8 implementation

To be clear: i'm fiine with mormot JSON parser, it is perfect and fits all my needs in terms of memory and performance. The only function I use from simdjson is validate_utf8 as a replacement of IsUTF8Valid.

Last edited by mpv (2021-07-25 19:11:40)

Offline

#6 2021-07-25 20:56:53

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

I could try to isolate the validate_utf8 and allow static linking it on x86_64 linux.

Offline

#7 2021-07-26 07:43:58

okoba
Member
Registered: 2019-09-29
Posts: 121

Re: Fast (very fast) IsValidUTF8 implementation

I think you may get better results trying to rewrite it and prevent dependency.

Offline

#8 2021-07-26 13:49:07

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

Please check https://github.com/synopse/mORMot2/comm … 60317d3097
It is enabled only on Haswell level CPUs with AVX2 + BMI + SSE 4.2.
And on FPC only, since Delphi has no proper AVX2 asm support - even the latest version.
Numbers are very good.

I have also enhanced the pascal version, which is faster than before too.

Also the regression tests now validate that invalid UTF-8 is detected at any position in the input text.

Offline

#9 2021-07-26 14:45:48

okoba
Member
Registered: 2019-09-29
Posts: 121

Re: Fast (very fast) IsValidUTF8 implementation

Very nice.
But I can not compile it with the latest Trunk FPC on Win10 X64. IsValidUtf8Avx2 crashes.
mORMot test fail too on this function.
By disabling the AVX version, Pas version works 3GB/s for me on i9.
Unrelated note: In TDynArrayHasher.FindOrNew, checking  Assigned(Compare) fails on FPC trunk. Previous versions work without problem.

Last edited by okoba (2021-07-26 15:04:22)

Offline

#10 2021-07-26 15:50:32

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

Don't use FPC trunk. We don't support it.
It is too much unstable.

Edit: are you using Win64?
I guess there is a problem with the Win64 ABI by now. I validated it only on Linux x86_64.
I will try to fix it.

Edit 2: I confirm the code is not Win64 compatible.
I have enabled this AVX2 code for x86_64 POSIX only - which matches the main usecase of a production server.

Offline

#11 2021-07-26 19:36:22

okoba
Member
Registered: 2019-09-29
Posts: 121

Re: Fast (very fast) IsValidUTF8 implementation

Thanks for the update.
About Trunk, it let me try mORMot code with latest FPC updates as I use mORMot a lot for daily tasks like array, dictionary, Unicode, file and json. And until the latest update to the TDynArrayHasher it worked just fine and passed all the test so it may be a good idea to keep it running and testing latest things.
It is the case for V2, I agree with you on V1 of mORMot as it was always problematic to use it with Trunk. But your updates to V2 made it very compatible and comfortable to use.

Offline

#12 2021-07-27 08:37:55

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

Thanks for feedback.

I have just added Win64 support of IsValidUtf8Avx2().
Only for FPC since Delphi doesn't support AVX inlined assembly.

Offline

#13 2021-07-27 08:51:19

okoba
Member
Registered: 2019-09-29
Posts: 121

Re: Fast (very fast) IsValidUTF8 implementation

Thanks for the update. Checked it and it works near 20GB/s! Great!

About TDynArrayHasher, it still does not work for FPC even with the new parenthesis, it seems an @ is needed.
I tried to minimize the problem in a new project and included mORMot define too, it compiles correctly, the problem is only happening in the unit. Sorry for the trouble, but I think it worth to be able to compile with the latest version of FPC.

Offline

#14 2021-07-27 17:04:31

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

I have added some new benchmarks, and also optimized the mORMot 2 JSON parser even further.

mORMot 2 JSON parsing performance seems really high - several orders of magnitude faster than the fastest Delphi/FPC libraries which are dwsJSON and JsonDataObjects.
Perhaps I would write a blog article about those numbers.

Edit: the initial numbers were incorrect.
I have fixed TTestCoreProcess.JSONBenchmark and published some new numbers:
https://github.com/synopse/mORMot2/comm … 3ed789fb46
Don't worry, mORMot 2 is till way ahead. wink

Offline

#15 2021-07-27 20:21:45

okoba
Member
Registered: 2019-09-29
Posts: 121

Re: Fast (very fast) IsValidUTF8 implementation

Great!
Yes mORMot is the very fast and a blog post as an update to previous ones seems nice. May I suggest having an independent demo that clearly shows the benefits and speed? It helps to answer questions like this topic.
Also it may be a good idea to add JsonTools (from this topic) too as it is very clean and seems fast.
Topic: https://forum.lazarus.freepascal.org/in … opic=46533

PS, can you please update the TDynArrayHasher compile issue with trunk? It helps someone like me to keep up with your fast updates while maintaining daily codes.

Last edited by okoba (2021-07-27 20:25:52)

Offline

#16 2021-07-28 11:19:21

okoba
Member
Registered: 2019-09-29
Posts: 121

Re: Fast (very fast) IsValidUTF8 implementation

Thank you for the updates.

Offline

#17 2021-07-30 17:31:20

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,574
Website

Re: Fast (very fast) IsValidUTF8 implementation

Great work!! Many thanks!

A adopt a new IsValidUTF8 funtcion for mORMot1 (I'm still on mORMot1) - https://github.com/synopse/mORMot/pull/400

Offline

#18 2021-07-30 20:46:51

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

Thanks.

I made some comments to the pull request.
Not yet ready for production.

Offline

#19 2021-07-31 12:02:43

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,574
Website

Re: Fast (very fast) IsValidUTF8 implementation

@ab - there is no comments in PR yet..

Offline

#20 2021-07-31 12:49:24

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

They are at the source level:
https://github.com/synopse/mORMot/pull/400/files

1) I would rather put this into SynTable: SynCommons is already huge... too huge...

2) Reuse the pascal function in the ASMX64AVX branch, which is currently broken so I can't merge it as such.

I could do it on my side, if you prefer, once you have validated on your side that it works as expected, and can be substituted to the external .so library.

Offline

#21 2021-07-31 14:13:35

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,574
Website

Re: Fast (very fast) IsValidUTF8 implementation

OK, than I merge a #400 into SyNodeCleanup brunch, which i am using to build UnityBase and deploy version on the my testing environment to confirm everything works as expected. Currently my autotests are passed, but on Monday testing team starts works with real use cases and we ensure everything is OK.  If so, I will ask you to do the necessary changes SynCommons -> SynTable.

Offline

#22 2021-07-31 14:41:57

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

I have made https://synopse.info/fossil/info/ab50456505 as an official port in SynTable unit.

Hope it helps.

Offline

#23 2021-09-14 15:04:04

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,574
Website

Re: Fast (very fast) IsValidUTF8 implementation

I found the strange behavior of new ASM code - reproduced only under Windows x64 and only if compiled with -O2 and higher (FPC3.2.0)
After call to IsValidUTF8 in scenarios like

function test(const aStr: RawByteString);
begin
  if not IsValidUTF8(pointer(aStr), length(aStr)) then
   ..
end;

aStr become nil.
Not reproduced under Linux x64 with any optimization level (we already use it on prod under Linux).
@ab - may be you have some ideas why this may happens, because I cant understand yet..

P.S.
In real life string become nil after this line - https://github.com/synopse/mORMot/blob/ … te.pas#L42

Last edited by mpv (2021-09-14 15:10:51)

Offline

#24 2021-09-14 15:30:48

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

IsValidUtf8Avx2() was indeed not Win64 compatible.

Please check https://synopse.info/fossil/info/7db5063372
- also fixed on mORMot 2

Offline

#25 2021-09-14 18:06:51

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,574
Website

Re: Fast (very fast) IsValidUTF8 implementation

Unfortunately the same behavior - string become empty after call to IsValidUTF8 (with -O1 string is OK)

Offline

#26 2021-09-15 06:31:42

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,718
Website

Re: Fast (very fast) IsValidUTF8 implementation

You are right: WIN64ABI is a mORMot 2 specific conditional.

We just need to replace it with MSWINDOWS or WIN64.

Should be fixed now https://synopse.info/fossil/info/962c8e03cc

Offline

#27 2021-09-15 08:45:46

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,574
Website

Re: Fast (very fast) IsValidUTF8 implementation

Now problem is solved - thank you very much!
I found my CI server uses XeonE5-2640 v2 CPU which do not support AVX2 (E5v2 - only AVX, E5v3 - AVX2) - this is the reason why I caught this error so late.

Offline

Board footer

Powered by FluxBB