#1 Yesterday 14:55:11

danielkuettner
Member
From: Germany
Registered: 2014-08-06
Posts: 343

malloc(): unaligned tcache chunk detected

Hi Arnaud,

under Linux + fpc 3.2.0 -O3 http async (epoll) built with mORMot2 1c3447f4 (2024-09-17) and I'm using cmem.
I've got that error and our service was killed by os (ubuntu).

Is that error known and fixed in a later version?

Last edited by danielkuettner (Yesterday 16:08:02)

Offline

#2 Yesterday 16:14:45

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,579
Website

Re: malloc(): unaligned tcache chunk detected

Very interesting.
This error was reported also during last TFB testing.
But we were not able to reproduce it here.
Our guess is that there is a memory-related bug in mORMot async server, which is usually not triggered by the mormot.core.fpcx64mm heap but more often by cmem.

So it is good news that you were able to reproduce it!

Do you have more info?
Where/when is it killed?
What is the load?
Can you generate the logs in verbose mode (with full low-level async process info) and send them to me?

Perhaps with more context, we could be able to locate the faulty part of the code...

Offline

#3 Yesterday 17:28:13

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,568
Website

Re: malloc(): unaligned tcache chunk detected

So problem is not in HTTP pipelining.
Hope Daniel gives us a way to reproduce this issue

Offline

#4 Yesterday 17:41:54

danielkuettner
Member
From: Germany
Registered: 2014-08-06
Posts: 343

Re: malloc(): unaligned tcache chunk detected

Yes I hope too.

This error is not really reproducible. I had it the first time since July, but we had several other errors over the last months (e.g. service not responsible or undefined exceptions or the Variant error with with _Safe(v)^.InitArrayFromCsvFile and memory corruption after that) and we have changed a lot of possible bugs on our side but without serious knowledge.

I can change logging to verbose mode and send the file to you (but it's very huge and I will give you an url for downloading per email) but also verbose logging wasn't enough in past. Perhaps adding logs in mormot.net.async or mormot.net.sock is the way to go.

If error is async related could I change to useHttpSocket for testing (in front of our service we are using nginx as reverse proxy)?

->Do you have more info?
Only this errors I had in logfile since last weeks:
20241015 15564437 EXC EThreadError {Message:"Thread error"} [R12:root]
20241015 15564446 EXC EThreadError {Message:"Thread error"} [R14:root]
20241015 15564448 EXC EThreadError {Message:"Thread error"} [R13:root]
20241015 16171856 EXC EThreadError {Message:"Thread error"} [R13:root]
20241015 16171859 EXC EThreadError {Message:"Thread error"} [R8:root]
But it seems to be transparent from the client side, because we have no error responses related to that errors.

Atm I use two services as upstream under nginx (linux as main service and a windows http.sys service as backup). The windows service seems to be more stable as the linux with async server.


->Where/when is it killed?
It's killed from os today. Here are the rows from syslog:
Oct 15 16:09:53 mssql SOneSrv2[358268]: malloc(): unaligned tcache chunk detected
Oct 15 16:09:53 mssql systemd[1]: SOneSrv2.service: Main process exited, code=killed, status=6/ABRT
Oct 15 16:09:53 mssql systemd[1]: SOneSrv2.service: Failed with result 'signal'.
Oct 15 16:09:53 mssql systemd[1]: SOneSrv2.service: Consumed 28min 29.400s CPU time.
Oct 15 16:10:23 mssql systemd[1]: SOneSrv2.service: Scheduled restart job, restart counter is at 1.

->What is the load?
It's difficult to answer.
Per day we have about 400-500 thousand requests with a average request time of 20ms, but there are also request with text search in MongoDB that could run 10s.
We are using an lxc container with 32 cores, 100G mem (60G free) and zfs. Our service has 32 worker threads and there are about 100-200 concurrent users.
The cpu-usage is mostly under 10% but for some seconds over 100% because MongoDB is heavily used in front of postgres. But I would say we don't have any overload here.

Offline

#5 Yesterday 20:49:04

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,579
Website

Re: malloc(): unaligned tcache chunk detected

I don't know where this EThreadError come from.

On FPC, this EThreadError comes from the fpc_threaderror RTL function.
It seems to be triggerred only by the cthreads.pp unit, when a TRtlCriticalSection is used or when RTL/Basic Events are created.

Since events are created once per thread, and the exception is raised during the run on existing threads, my guess is that it is a failing pthread_mutex_lock/trylock/unlock.
I will review the TRtlCriticalSection involved with the async server, but AFAICT there are very few of them.

Offline

Board footer

Powered by FluxBB