#1 2017-07-20 12:28:03

RObyDP
Member
Registered: 2015-02-19
Posts: 62

RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

hello Arnaud,

sorry, me again, I'm your nightmare :-P

please, if you have time, to do a test of the Synopse with my patches www.dellapasqua.com, one is for mem manager and rtl fillchar, copymem, etc, another is for zlib,

in alternative, where can I find the Synopse benchmark, so to test the speed and reliability?

Btw. do you like extend the IPP patches with other functions as Math, Vectors, etc.?

A bientot.

Roberto

Offline

#2 2017-07-20 16:29:54

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

You can run the SQLite3\TestSQL3.dpr regression tests, and check the numbers reported.
Note that mORMot units try to avoid memory allocation as much as possible for most used code (e.g. JSON production or parsing), so depending on the process, the heap manager won't have a big effect.

Why not statically include the code as asm source, or at least .o files?

Offline

#3 2017-07-20 17:02:08

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

because Intel license permits to redistribute under DLL forms :-\ (if I read correctly)

Offline

#4 2017-07-20 17:34:51

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

indeed, the results are similar (Intel MM is good for many threads concurrently):

Using IntelTBB
1.1. Low level common:
  Total failed: 0 / 10,952,515  - Low level common PASSED  2.45s

1.2. Low level types:
  Total failed: 0 / 733,788  - Low level types PASSED  173.84ms

1.3. Big table:
  Total failed: 0 / 886,592  - Big table PASSED  1.30s

2.11. DDD shared units:
  Total failed: 0 / 80,388  - DDD shared units PASSED  462.14ms



Without IntelTBB
1.1. Low level common:
  Total failed: 0 / 10,951,964  - Low level common PASSED  2.41s

1.2. Low level types:
  Total failed: 0 / 734,580  - Low level types PASSED  239.36ms

1.3. Big table:
  Total failed: 0 / 886,427  - Big table PASSED  1.39s

2.11. DDD shared units:
  Total failed: 0 / 80,388  - DDD shared units PASSED  991.26ms

(btw. I have used SynSQLite3Static, but the multithreaded test doesn't run, any hint?)

Offline

#5 2017-07-20 18:52:25

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

errata corrige:
using FastMM4 NoThreadContention 22098 msec
using ScaleMM2 22393 msec
using Windows 10 / Windows 2016 Heap 5102 msec
using Intel TBB + Intel IPP 3975 msec

Offline

#6 2017-07-20 18:54:22

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

do you think it's possible to convert the DLLs in one OBJ, maybe using objconv AgnSoft? It's a very hard task, no?

Offline

#7 2017-07-21 12:11:45

emk
Member
Registered: 2013-10-24
Posts: 96

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

it's open source:

"Threading Building Blocks(Intel® TBB) 2017 now under Apache 2.0 license" https://www.threadingbuildingblocks.org/Licensing

Can you compare Intel MM (TBB) against this patch of FastMM-AVX? https://synopse.info/forum/viewtopic.php?id=57&p=4  Goto post #178 (It says is much better that NeverSleepOnThreadContetion)

Offline

#8 2017-07-21 12:52:46

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

FastMM4 22sec
FastMM4-AVX 18sec
IntelPPI+TBB 4sec

(consider that under single thread fastmm performs slightly better than Intel)

Offline

#9 2017-07-21 13:38:27

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

I'm amazed by  Windows 10 / Windows 2016 Heap performance.
It was not the case with previous versions!

Offline

#10 2017-07-21 15:41:02

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

10.0.15063 (win10 creators update)

Offline

#11 2017-07-21 17:08:31

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

well, parallelizing the MM is not enough to get better overall performances, all the algorithms should be made with parallel class, as TParallel.For, etc. and not every task can be parallelized. Anyway the cpu near future is the massive multicore enlargement instead of Ghz war. So the direction seems this.

Offline

#12 2017-07-21 17:10:21

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

what do you think about a Delphi compiler OPENMP adherence?

Offline

#13 2017-07-22 10:16:44

miab3
Member
From: Poland
Registered: 2014-10-01
Posts: 188

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

@RObyDP,

Could you test in the same conditions BrainMM?
https://github.com/d-mozulyov/BrainMM/tree/development

Michal

Offline

#14 2017-07-22 10:32:38

emk
Member
Registered: 2013-10-24
Posts: 96

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

He already tested, but it was a bug in BrainMM and crashed on Win64.

Offline

#15 2017-07-22 10:54:05

miab3
Member
From: Poland
Registered: 2014-10-01
Posts: 188

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

Offline

#16 2017-07-22 13:03:20

emk
Member
Registered: 2013-10-24
Posts: 96

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

Roberto Della Pasqua:
"BrainMM actually crashes under some circumstances under WIN64, the author will correct it later this year. I did today another test, intel tbb+intel ipp single thread is slighty faster than fastmm4 and many times faster under threads. If the license will be more permissive perhaps we can build a custom LLVM4 object to embed statically."

Offline

#17 2017-07-22 21:23:46

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

Hi!

Unfortunately the library has a few critical bugs, which will be fixed, I think, this autumn. Look for example here: https://github.com/d-mozulyov/BrainMM/issues/5

Thanks for attention to my project


Отправлено с iPhone

13 июля 2017 г., в 15:20, Roberto Della Pasqua <rdp@dellapasqua.com> написал(а):
Hello Dimitry,

I like ask if your BrainMM is reliable enough to be used for server 64bit 24/7 applications,
does it suffers of mem fragmentation or leaks, troubles at all?

Do you have customer reports of months of work?

Thank you.

Regards.

Offline

#18 2017-07-22 21:32:25

miab3
Member
From: Poland
Registered: 2014-10-01
Posts: 188

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

@RObyDP,

What was the test result of BrainMM for Win32?

Michal

Offline

#19 2017-07-23 15:57:18

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

I don't have tested, but under win32 seems the faster.

Offline

#20 2017-07-24 09:26:32

emk
Member
Registered: 2013-10-24
Posts: 96

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

It's really fast (i3-6100T):

32bit, Multicore:

FastMM-AVX (only ERMS enabled)  - 18 sec;
BrainMM - 8 sec;

BrainMM should be investigated how good it is on some 24/7 processes, because performance is stellar!

Offline

#21 2017-07-24 12:25:52

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

I'll do a build of IPP+TBB for IA32, and will see how performs.

Offline

#22 2017-07-24 13:30:03

RObyDP
Member
Registered: 2015-02-19
Posts: 62

Re: RTL 64bit patched with Intel SIMD IPP TBB, MM from 70secs to 4 secs !

but haven't time now, so I trust you about 32bit

Offline

Board footer

Powered by FluxBB