#1 2016-02-10 18:14:58

miab3
Member
From: Poland
Registered: 2014-10-01
Posts: 188

Scalemm2 test

Hi Arnaud,

I did a comparison test  Native and Scalemm2, using Steve Maughan "Speed Test".

http://www.stevemaughan.com/delphi/delp … -managers/
https://cozmixdownloads.s3.amazonaws.co … Source.zip

Delphi 10 Seattle on Windows7-64 i7-2600 4C-HT 3,5GHz

Native
SC-32 21,5
MC-32 9,3
SC-64 39,6
MC-64 18,1

Scalemm2
SC-32 9,4 !?!?
MC-32 2,6 !?!?
SC-64 40,1
MC-64 10,7 !

Curious especially Scalemm2-32.

Michal

Last edited by miab3 (2016-02-10 18:30:17)

Offline

#2 2016-02-11 00:12:01

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,666
Website

Re: Scalemm2 test

What do the numbers mean?

Offline

#3 2016-02-11 00:16:21

miab3
Member
From: Poland
Registered: 2014-10-01
Posts: 188

Re: Scalemm2 test

@ab

sec

Michal

Offline

#4 2016-02-11 07:58:08

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,666
Website

Re: Scalemm2 test

So, I guess that MC is for "Multi-Core" and SC for "Single-Core".
Note that there is multi-threading in SC.
In fact, it is not "Multi-Core", but "with PPL".
And during the PPL process, the allocation is not the bottleneck, but list entries deletion, i.e. memory copy.

So SC measures the threading abilities of the memory manager.
Sounds like if ScaleMM2 is scaling almost linearly here.

And MC measures the threading ability together with a CPU process.
And here ScaleMM2 does also a very good job.

In fact, ScaleMM2 seems to outperform NexusDB by a huge factor, if we compare those results with http://www.stevemaughan.com/delphi/delp … -managers/
The NexusDB unified heap, even if using lock-free lists, is still using CAS asm operations, so there are CPU-level locks, whereas the ScaleMM2 is really lock-free, since it maintains a heap per thread.
Of course, the downside of having one heap per thread is that the process would use much more virtual memory than FastMM4 (and also NexusDB).

So, in practice, for multi-threaded server process, we usually use FastMM4, and reduce the memory allocation as much as possible.
We designed the whole mORMot framework to avoid allocating memory. For instance, we use stack-allocated buffers (e.g. for integer to text conversion, or via TSynTempBuffer), and never do string concatenation, but rely on optimized a TTextWriter class. And of course, we maintain a thread-pool for the HTTP process.
FastMM4 is just enough, and using ScaleMM2 doesn't give a real benefit, but would consume much more memory.
Thanks to this, servers perform very well, and use very little memory.

Offline

Board footer

Powered by FluxBB