plz help with LLVM5 and ZlibSSE

RObyDP · 2017-09-11 18:16:50

hello,

I'm evaluating the speed of many zlib implementations, ng, cloudflare, intel, bcz I want try to parallelize some parts
all ok making DLL with VC 2015-2017

I like have OBJ to statically link them inside Delphi
all ok with gnu cc

but I'm stopped with LLVM and cloudflare

PLEASE can somebody help me?

I have done a lot of testing editing the sources and the compiler options, but I obtain always wrong results.

So:
1) download Cloudflare from https://github.com/cloudflare/zlib
2) download LLVM 5.0 from www.llvm.org
3) if you have visual studio community 2015 then LLVM take the headers from default include folders (or download windows PSDK)

Try to produce objects as example:
clang -c -O3 -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -DZLIB_WINAPI -DASMV -DASMINF -DWIN64 -mssse3 -msse4.1 -msse4.2 *.c
(there are an option for a crc32 function taken from Linux kernel, but avoiding this define we don't have problems, thus is used in gzip that I don't need)

ok try to map the objects with SynSSLZip, and see results: with DeflateInit2 (mandatory for deflate browser compatibility) the results are not correct

btw. here a revised struct for z_stream:

TZ_stream = record
next_in: PByte;
avail_in: UInt32;
total_in: UInt64;
next_out: PByte;
avail_out: UInt32;
total_out: UInt64;
msg: PAnsiChar;
state:Pinternal_state;
zalloc: alloc_func;
zfree: free_func;
opaque: Pointer;
data_type: Integer;
adler: UInt64;
reserved: UInt64;
end;

sizeof=112

please help, also something I can pay if we solve this, where is the trick with LLVM?

for your curiosity, actually, from my tests, intel zlib and cloudflare zlib are similar: cloudflare is better at level 2 and up, intel is better at -2 level (avoiding crc adler checksum)
sorry for my english, I'm in a hurry up

(eventually rdp@dellapasqua.com)
A bientot

ab · 2017-09-11 19:01:17

I was able to let the CloudFlare fork work with FPC via https://synopse.info/fossil/finfo?name=SynZLibSSE.pas

RObyDP · 2017-09-11 19:37:17

well I did a lot of DLL versions with Intel, Intel+PPI, cloudflare, checksum on and off...
I like have a objects build with CLANG
can we try?

ab · 2017-09-12 17:11:34

Did you try with FPC and the files available at https://github.com/synopse/mORMot/tree/master/fpc-win64 ?

RObyDP · 2017-09-12 17:24:10

can I ask, do you use MinGw64 to make the *.obj?

RObyDP · 2017-09-12 17:29:48

do you have used the PCLMUL define of Cloudflare? Because 'crc32_pclmul_le_16' exists only in Linux kernel.
Perhaps do you know in Visual C ++ the equivalent of -msse4.2 -mpclmul in gcc (to enable xmmintrin and emmintrin SSE?)
plz consider, I have little experience with C compilers.

RObyDP · 2017-09-12 20:06:55

clang with O2 works, but a test take 13seconds; with O3 corrupt the results, but take 3.4 seconds
under Linux with GCC test take 3.2 seconds
:-\

ab · 2017-09-12 20:21:39

I used GCC, IIRC.

RObyDP · 2017-09-12 20:37:17

I have isolated the problem, it's in _mm_crc32_u32 LLVM, now I check the function headers (seems a integer overflow)

RObyDP · 2017-09-12 20:48:28

GCC under Windows? MinGw or CygWin? But seems that we cannot use a function from gpl code as linux kernel for commercial purposes, true?

ab · 2017-09-12 21:04:14

MinGW, of course. And no problem about the license: this is the ZLib licence. No Linux involved.

RObyDP · 2017-09-12 21:44:16

I compiled gcc -c -O3 -msse4.2 -mpclmul, but the results are slow. How you can be so fast? But I suppose this is a magic secret :-)

RObyDP · 2017-09-12 21:45:28

I'm talking with the engineer who did cloudflare patch, indeed with Clang the issue are with the SSE* calls.

RObyDP · 2017-09-13 10:15:32

look at the Zlib patches from Intel (the files are in the Intel IPP for Linux), they works fine as static DLL, introducing a DeflateInit -2 and DeflateInit2 -2 options for fastest mode.
Under Win64 this performs very similar to cloudflare.

RObyDP · 2017-09-13 13:03:59

ok I did test on real html files produced by wordpress and a forum php, run a test with similar compression ratio (1 for gzip, deflate, -2 for intel)
gzip 12 seconds
your mingw 5.6 sec (cloudflare)
intel 2.5 sec

parallelized intel provide 20Gbit/s html IN -> 4.4Gbit/s compressed output on i7 4/8 cores 3.4 Ghz

I wait the correction from cloudflare engineer, for LLVM patches, then I let you know
A bientot

ab · 2017-09-13 19:01:59

-2 is not a true deflate, since there is no adler32, right?

RObyDP · 2017-09-13 22:16:05

They don't fill the dictionary table of common tokens during the deflate, so for example "ciao ciao ciao" is not compressed at all, the checksum hash crc32 is still used.

ab · 2017-09-14 06:41:25

Still much slower than LZ4 or SynLZ for compression per core.

RObyDP · 2017-09-14 06:58:37

indeed, it's very slow against lz4, synlz or snappy for example, useful only for default web browser compression

ab · 2017-09-14 07:29:20

For web compression I usually put a nginx frontend which handle compression, https and http2...

RObyDP · 2017-09-18 18:42:31

off-offtopic

thread 1) read @Int64
00000000006CCAC3 488B4538 mov rax,[rbp+$38] pointer to address
00000000006CCAC7 48894530 mov [rbp+$30],rax read from a 64bit quadword single op mov (mov value into register)

thread 2) write @Int64
00000000006CC8FA 488B4538 mov rax,[rbp+$38] pointer to the same address
00000000006CC8FE 488905739F0500 mov [rel $00059f73],rax write the above 64bit quadword with a single op mov

do you confirm that here we don't need atomic sync functions because of the single mmu align (64bit or more, so a collision never can happen)?
thanks

ab · 2017-09-19 07:51:57

It could not be atomic.
The problem with atomicity is to read and write at the same time.
Here it may be unsynchronized.
And the cache may not be consistent.

RObyDP · 2017-09-19 09:37:39

ok, well, in the case the sequence order don't need to be synchronized (it's not important the order of execution read write there), I can proceed without interlocked* calls without worries?
btw. thanks for your time

mORMot Open Source

#1 2017-09-11 18:16:50

plz help with LLVM5 and ZlibSSE

#2 2017-09-11 19:01:17

Re: plz help with LLVM5 and ZlibSSE

#3 2017-09-11 19:37:17

Re: plz help with LLVM5 and ZlibSSE

#4 2017-09-12 17:11:34

Re: plz help with LLVM5 and ZlibSSE

#5 2017-09-12 17:24:10

Re: plz help with LLVM5 and ZlibSSE

#6 2017-09-12 17:29:48

Re: plz help with LLVM5 and ZlibSSE

#7 2017-09-12 20:06:55

Re: plz help with LLVM5 and ZlibSSE

#8 2017-09-12 20:21:39

Re: plz help with LLVM5 and ZlibSSE

#9 2017-09-12 20:37:17

Re: plz help with LLVM5 and ZlibSSE

#10 2017-09-12 20:48:28

Re: plz help with LLVM5 and ZlibSSE

#11 2017-09-12 21:04:14

Re: plz help with LLVM5 and ZlibSSE

#12 2017-09-12 21:44:16

Re: plz help with LLVM5 and ZlibSSE

#13 2017-09-12 21:45:28

Re: plz help with LLVM5 and ZlibSSE

#14 2017-09-13 10:15:32

Re: plz help with LLVM5 and ZlibSSE

#15 2017-09-13 13:03:59

Re: plz help with LLVM5 and ZlibSSE

#16 2017-09-13 19:01:59

Re: plz help with LLVM5 and ZlibSSE

#17 2017-09-13 22:16:05

Re: plz help with LLVM5 and ZlibSSE

#18 2017-09-14 06:41:25

Re: plz help with LLVM5 and ZlibSSE

#19 2017-09-14 06:58:37

Re: plz help with LLVM5 and ZlibSSE

#20 2017-09-14 07:29:20

Re: plz help with LLVM5 and ZlibSSE

#21 2017-09-18 18:42:31

Re: plz help with LLVM5 and ZlibSSE

#22 2017-09-19 07:51:57

Re: plz help with LLVM5 and ZlibSSE

#23 2017-09-19 09:37:39

Re: plz help with LLVM5 and ZlibSSE

Board footer