#1 2017-09-11 18:16:50

RObyDP
Member
Registered: 2015-02-19
Posts: 62

plz help with LLVM5 and ZlibSSE

hello,

I'm evaluating the speed of many zlib implementations, ng, cloudflare, intel, bcz I want try to parallelize some parts
all ok making DLL with VC 2015-2017

I like have OBJ to statically link them inside Delphi
all ok with gnu cc

but I'm stopped with LLVM and cloudflare

PLEASE can somebody help me?

I have done a lot of testing editing the sources and the compiler options, but I obtain always wrong results.

So:
1) download Cloudflare from https://github.com/cloudflare/zlib
2) download LLVM 5.0 from www.llvm.org
3) if you have visual studio community 2015 then LLVM take the headers from default include folders (or download windows PSDK)

Try to produce objects as example:
clang -c -O3 -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -DZLIB_WINAPI -DASMV -DASMINF -DWIN64 -mssse3 -msse4.1 -msse4.2 *.c
(there are an option for a crc32 function taken from Linux kernel, but avoiding this define we don't have problems, thus is used in gzip that I don't need)

ok try to map the objects with SynSSLZip, and see results: with DeflateInit2 (mandatory for deflate browser compatibility) the results are not correct

btw. here a revised struct for z_stream:

TZ_stream = record                         
    next_in: PByte;
    avail_in: UInt32;
    total_in: UInt64;
    next_out: PByte;
    avail_out: UInt32;
    total_out: UInt64;
    msg: PAnsiChar;
    state:Pinternal_state;
    zalloc: alloc_func;
    zfree: free_func;
    opaque: Pointer;
    data_type: Integer;
    adler: UInt64;
    reserved: UInt64;
  end;

sizeof=112

please help, also something I can pay if we solve this, where is the trick with LLVM?

for your curiosity, actually, from my tests, intel zlib and cloudflare zlib are similar: cloudflare is better at level 2 and up, intel is better at -2 level (avoiding crc adler checksum)
sorry for my english, I'm in a hurry up

(eventually rdp@dellapasqua.com)
A bientot

Offline

#2 2017-09-11 19:01:17

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,656
Website

Re: plz help with LLVM5 and ZlibSSE

I was able to let the CloudFlare fork work with FPC via https://synopse.info/fossil/finfo?name=SynZLibSSE.pas

Online

#3 2017-09-11 19:37:17

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

well I did a lot of DLL versions with Intel, Intel+PPI, cloudflare, checksum on and off...
I like have a objects build with CLANG
can we try?

Offline

#4 2017-09-12 17:11:34

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,656
Website

Re: plz help with LLVM5 and ZlibSSE

Did you try with FPC and the files available at https://github.com/synopse/mORMot/tree/master/fpc-win64 ?

Online

#5 2017-09-12 17:24:10

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

can I ask, do you use MinGw64 to make the *.obj?

Offline

#6 2017-09-12 17:29:48

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

do you have used the PCLMUL define of Cloudflare? Because 'crc32_pclmul_le_16' exists only in Linux kernel.
Perhaps do you know in Visual C ++ the equivalent of -msse4.2 -mpclmul in gcc (to enable xmmintrin and emmintrin SSE?)
plz consider, I have little experience with C compilers.

Offline

#7 2017-09-12 20:06:55

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

clang with O2 works, but a test take 13seconds; with O3 corrupt the results, but take 3.4 seconds
under Linux with GCC test take 3.2 seconds
:-\

Offline

#8 2017-09-12 20:21:39

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,656
Website

Re: plz help with LLVM5 and ZlibSSE

I used GCC, IIRC.

Online

#9 2017-09-12 20:37:17

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

I have isolated the problem, it's in _mm_crc32_u32 LLVM, now I check the function headers (seems a integer overflow)

Offline

#10 2017-09-12 20:48:28

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

GCC under Windows? MinGw or CygWin? But seems that we cannot use a function from gpl code as linux kernel for commercial purposes, true?

Offline

#11 2017-09-12 21:04:14

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,656
Website

Re: plz help with LLVM5 and ZlibSSE

MinGW, of course. And no problem about the license: this is the ZLib licence. No Linux involved.

Online

#12 2017-09-12 21:44:16

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

I compiled gcc -c -O3 -msse4.2 -mpclmul, but the results are slow. How you can be so fast? But I suppose this is a magic secret :-)

Offline

#13 2017-09-12 21:45:28

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

I'm talking with the engineer who did cloudflare patch, indeed with Clang the issue are with the SSE* calls.

Offline

#14 2017-09-13 10:15:32

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

look at the Zlib patches from Intel (the files are in the Intel IPP for Linux), they works fine as static DLL, introducing a DeflateInit -2 and DeflateInit2 -2 options for fastest mode.
Under Win64 this performs very similar to cloudflare.

Offline

#15 2017-09-13 13:03:59

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

ok I did test on real html files produced by wordpress and a forum php, run a test with similar compression ratio (1 for gzip, deflate, -2 for intel)
gzip 12 seconds
your mingw 5.6 sec (cloudflare)
intel 2.5 sec

parallelized intel provide 20Gbit/s html IN -> 4.4Gbit/s compressed output on i7 4/8 cores 3.4 Ghz

I wait the correction from cloudflare engineer, for LLVM patches, then I let you know
A bientot

Offline

#16 2017-09-13 19:01:59

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,656
Website

Re: plz help with LLVM5 and ZlibSSE

-2 is not a true deflate, since there is no adler32, right?

Online

#17 2017-09-13 22:16:05

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

They don't fill the dictionary table of common tokens during the deflate, so for example "ciao ciao ciao" is not compressed at all, the checksum hash crc32 is still used.

Offline

#18 2017-09-14 06:41:25

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,656
Website

Re: plz help with LLVM5 and ZlibSSE

Still much slower than LZ4 or SynLZ for compression per core.

Online

#19 2017-09-14 06:58:37

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

indeed, it's very slow against lz4, synlz or snappy for example, useful only for default web browser compression

Offline

#20 2017-09-14 07:29:20

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,656
Website

Re: plz help with LLVM5 and ZlibSSE

For web compression I usually put a nginx frontend which handle compression, https and http2...

Online

#21 2017-09-18 18:42:31

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

off-offtopic

thread 1) read @Int64
00000000006CCAC3 488B4538         mov rax,[rbp+$38] pointer to address
00000000006CCAC7 48894530         mov [rbp+$30],rax read from a 64bit quadword single op mov (mov value into register)

thread 2) write @Int64
00000000006CC8FA 488B4538         mov rax,[rbp+$38] pointer to the same address
00000000006CC8FE 488905739F0500   mov [rel $00059f73],rax write the above 64bit quadword with a single op mov

do you confirm that here we don't need atomic sync functions because of the single mmu align (64bit or more, so a collision never can happen)?
thanks

Offline

#22 2017-09-19 07:51:57

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,656
Website

Re: plz help with LLVM5 and ZlibSSE

It  could not be atomic.
The problem with atomicity is to read and write at the same time.
Here it may be unsynchronized.
And the cache may not be consistent.

Online

#23 2017-09-19 09:37:39

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: plz help with LLVM5 and ZlibSSE

ok, well, in the case the sequence order don't need to be synchronized (it's not important the order of execution read write there), I can proceed without interlocked* calls without worries?
btw. thanks for your time

Offline

Board footer

Powered by FluxBB