You are not logged in.
The Github issue was closed. But I still think there are some race conditions.
https://github.com/synopse/mORMot2/issues/321
Last edited by mormot.coremy_only_lonely (2024-12-09 15:34:36)
Offline
TL;WR: some potential race conditions have been identified in the async web server.
The most obvious have been identified and fixed.
But there still may be some potential dangling pointers access, so some GPF on production, in very specific cases (e.g. if the server is bloated and has no CPU power any more).
I would like to rewrite the whole connection object instances lifetime, and get rid of the "dual generation GC" trick we use.
The idea may be to use a single lock per connection (not only read or write), then let each thread acquire the connection instance for a while. Then release it with no dangling pointer.
A fully deterministic web server is better and safer.
My current issue is about performance: we store the connection pointer with the socket-tracking API, so a safer approach may be to use the "connection id sequence" instead - whereas it may be slower because we need to search for it in the connection list... we use a branchless O(log(n)) binary search for that, my guess is that a hash table may be slower. Using directly the connection pointer is O(1) and with no performance penalty... we have to ensure that no socket is still in the socket-tracking API before the connection (and its socket) are freed (and closed). And on some systems, the API can't allow that in a lockless manner.
We may need to monitor and measure if O(log(n)) is really too slow. And store the connection according to the API abilities: e.g. for epoll, which is lockless, store pointers; but use connection ID otherwise. It may be the safest and easiest path...
Any input is welcome!
Offline
That's great!
My thought is always, correctness first, performance later.
This could also happen when only one or some cores are 100% usage, not all cores.
Offline
But switching cores might not be an instant behavior.
Offline
Do we have any progress on this? I see some related commits, but the issue we discussed is not fixed.
Last edited by mormot.coremy_only_lonely (2024-12-31 09:44:24)
Offline