#1 2024-11-20 16:32:21

okoba
Member
Registered: 2019-09-29
Posts: 118

Multi thread free memory using fpcx64mm

One program is allocating huge memory parts (Total of 32GB) and I want to try to free it using multi threads to speed it up from near 4 seconds.
But in my tries, it seems completely linear.
Default memory manager of FPC does not free it until closing the program or on reuse.
Can it be done?
I am working with FPC and on Windows.

Offline

#2 2024-11-20 17:39:16

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,655
Website

Re: Multi thread free memory using fpcx64mm

I suspect you are using large blocks? That is, blocks > 256KB in size.

In fpcx64mm there is a giant lock for large blocks only when switching pointers: then syscall fpmunmap/VirtualFree is done with no lock, in the current thread.
So using fpcx64mm, there is no practical lock when releasing large blocks of memory. If you see no benefit with multithreading, it would be due to the Windows OS itself.

Offline

#3 Yesterday 12:43:11

zen010101
Member
Registered: 2024-06-15
Posts: 66

Re: Multi thread free memory using fpcx64mm

okoba wrote:

Default memory manager of FPC does not free it until closing the program or on reuse.

Yes, I meet the same issue. My program calls the web API 10,000 times, the memory always increases until I close the program. It reports no memory leaks. But when using fpcx64mm, the amount of memory is a fixed value.

P.S. the OS is windows. In aarach64, the memory manager of FPC is the same as fpcx64mm.

Offline

#4 Yesterday 12:50:30

okoba
Member
Registered: 2019-09-29
Posts: 118

Re: Multi thread free memory using fpcx64mm

@ab yes. They are a couple of megabytes. I hoped there is a way to speed it up you may be aware of.

Offline

#5 Yesterday 12:56:03

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,655
Website

Re: Multi thread free memory using fpcx64mm

@okaba
Did you try with fpcx64mm?
It is not clear from your posts.

Offline

#6 Yesterday 13:05:14

okoba
Member
Registered: 2019-09-29
Posts: 118

Re: Multi thread free memory using fpcx64mm

Yes! My test is done with fpcx64mm. The default memory manager allocation time is much slower. fpcx64mm allocation is very fast, but still freeing this much of memory takes time and I want to speed it up.
I just tested on Linux and it takes 100ms to free, but on Windows near 4 seconds.

Offline

#7 Yesterday 14:16:02

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,655
Website

Re: Multi thread free memory using fpcx64mm

So this is clearly a WinAPI issue.

Perhaps you could try to allocate bigger buffers (e.g. GetMem per 1GB) then do the sub-allocation in your own program, and release the whole big GB buffer at once in a single syscall?

Offline

#8 Yesterday 14:50:54

okoba
Member
Registered: 2019-09-29
Posts: 118

Re: Multi thread free memory using fpcx64mm

That can be done but it complicates the code quite a bit, and makes it harder to read. Preferably it should be done by the MM.
I made a test to verify it: https://gitlab.com/-/snippets/4771697
On my Windows 10 machine and FPC, blocks of 128 MB are much faster, but still much slower than FreeSingle on Linux (near 100ms).
Here are the times on Windows:

32 GB, 1 MB: 32768
Allocate: 32 GB in 4.62s i.e. 6.9 GB/s
FreeSingle: 32 GB in 2.57s i.e. 12.4 GB/s
Allocate: 32 GB in 4.66s i.e. 6.8 GB/s
FreeMulti: 32 GB in 2.87s i.e. 11.1 GB/s

32 GB, 4 MB: 8192
Allocate: 32 GB in 4.32s i.e. 7.4 GB/s
FreeSingle: 32 GB in 2.49s i.e. 12.8 GB/s
Allocate: 32 GB in 4.17s i.e. 7.6 GB/s
FreeMulti: 32 GB in 1.41s i.e. 22.6 GB/s

32 GB, 64 MB: 512
Allocate: 32 GB in 4.15s i.e. 7.7 GB/s
FreeSingle: 32 GB in 2.35s i.e. 13.5 GB/s
Allocate: 32 GB in 3.89s i.e. 8.2 GB/s
FreeMulti: 32 GB in 561.76ms i.e. 56.9 GB/s

32 GB, 128 MB: 256
Allocate: 32 GB in 3.96s i.e. 8 GB/s
FreeSingle: 32 GB in 2.40s i.e. 13.3 GB/s
Allocate: 32 GB in 3.85s i.e. 8.2 GB/s
FreeMulti: 32 GB in 514.55ms i.e. 62.1 GB/s

32 GB, 256 MB: 128
Allocate: 32 GB in 4.07s i.e. 7.8 GB/s
FreeSingle: 32 GB in 2.42s i.e. 13.1 GB/s
Allocate: 32 GB in 3.93s i.e. 8.1 GB/s
FreeMulti: 32 GB in 599.45ms i.e. 53.3 GB/s

32 GB, 1 GB: 32
Allocate: 32 GB in 4.04s i.e. 7.9 GB/s
FreeSingle: 32 GB in 2.41s i.e. 13.2 GB/s
Allocate: 32 GB in 3.96s i.e. 8 GB/s
FreeMulti: 32 GB in 1.36s i.e. 23.4 GB/s

What do you think?

One other question about fpcx64mm is that why it allocates much more Private bytes compare to Default MM?

Here are the numbers I get:

Default: Private Bytes: 32GB, Peak Working Set: 32GB
fpcx64mm: Private Bytes: 48GB, Peak Working Set: 32GB   

Last edited by okoba (Yesterday 15:08:44)

Offline

#9 Yesterday 15:45:14

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,655
Website

Re: Multi thread free memory using fpcx64mm

Instead of GetMem + FillZero you could just use AllocMem which fills with zero. wink
But I guess you want to write something to the memory to actually access the memory.

I don't know what to say, otherwise that the system behavior could change a lot between Windows 10 and Windows 11.
Here the bottleneck is the "VirtualFree" memory call.

"Private Bytes" are in fact reserved memory.
fpcx64mm pre-reserve the memory to avoid any hidden syscall and make first access to the ram faster.
So don't be afraid by this number, which is not the actual used memory in your RAM sticks.

Offline

Board footer

Powered by FluxBB