You are not logged in.
Pages: 1
In some of our productions we use a several external services by doing a HTTP request to them (from different server threads) and we discover some problems with TCurlHTTP.
In case number of requests increased (>300 RPS) very strange things are happens:
- we got a ban from DNS server because of huge DNS request count
- when we solve a DNS problem by adding URL to local resolver - we running out of ephemeral ports
- tuning a kernel (here is an tuning example ) helps but when RPS increased to >500 RPS problems starts again
The solution is to reuse an existed instance of TCurlHTTP (per thread per URL) - in this case one easy handle is used inside and it hold a per handle a DNS, TLS and TCP cache.
But this solution require a massive changes in sources (we currently apply it to most critical parts and this solves out problems).
Tomorrow I found a good solution - libcurl share interface, and implement it in pull request #376
@ab, please, review it and if everything is OK I can do the same changes for mormot2.
The only question - may be enable a share interface by default? From my tests there is no performance penalty on single thread and on reused TCurlHTTP instance with and without share interface.
Current implementation require to enable it manually in main some of program unit:
initialization
if CurlIsAvailable then begin
CurlEnableGlobalShare;
end;
end.
P.S.
Looks like TWinHTTP do some caching on library or kernel level (at last it reuse a TLS connection based on tests performance, we lack of something like strace on Windows to analyze deeper)
Last edited by mpv (2021-01-31 15:45:10)
Offline
Seems like your local resolver doesn't have caching enabled. It's very rare to get a ban from a DNS server (even for invalid queries), most probably you got rate-limited because of your requests/sec and that kind of soft-ban only lasts for a short time. If you can't get your caching fixed then you should make sure you don't exceed 5 requests per second, that's kinda the default norm, anything more then that and your requests will be considered suspicious.
Last edited by pvn0 (2021-02-01 07:25:36)
Offline
@mpv
I have merged - with some typo fixes - your pull request.
If you can, you can pull it to mORMot 2 too.
About enabling it by default, it could be a good idea.
I have enabled it https://synopse.info/fossil/info/9c8ad67c4a
Offline
@ab - thanks - I will prepare a patch for mORMot2.
@pvn0 - you are right - a local DNS resolver cache is miss-configured. But configure it property is not a trivial task. To get a local resolver cache we either should use a systemd-resolved, or install a local BIND(or somethng similar). Both options are not applicable inside a docker container, for example. Even in stock Ubuntu (20.04) in /etc/nsswitch.conf we see a record
hosts: files mdns4_minimal [NOTFOUND=return] dns mymachines
so mdns4_minimal is used instead of systemd-resolved.
And such a things can happens on each of environment. So better to have an DNS cache on application level (in our case - on libcurl).
Last edited by mpv (2021-02-01 17:34:49)
Offline
I have just added a thread-safe TSynDictionary-based cache of DNS names for NewSocket().
Check https://github.com/synopse/mORMot2/commit/61b5381058
It will work for the raw TNetSocket connections.
Offline
Just one important remark - application level DNS cache should have an expire. Either constant, as in libcurl (60 sec by default, see CURLOPT_DNS_CACHE_TIMEOUT) or based on TTL from DNS response( too complex to implement). Constant (property with ability to change it) is enough in 99% of cases.
This is because many many services uses DNS based load balancing, by returns a different IP for the same name with short TTL. For examlle solutions based on hashicorp consul, etc.
Offline
Thanks for the improvements!
Just to confirm, about libcurl share interface. I check the code twice and the cookies are not being shared, correct?
It is important that this behavior does not change.
Offline
@mpv
There was a 1 hour timeout of our DNS name cache.
Perhaps it is too big...
But 1 minute may really be too small to have most interest, no?
What I have just added is:
- set the default timeout to 10 minutes;
- a method to change the timeout value;
- flush the local cache entry on connection failure, to force call the DNS resolver again.
Offline
@ab, now everithing looks fine, IMHO
@macfly, about cookies cache in libcurl shared cache.
Yes, cookies cache is not enabled. And I do not see any reason to enable it for mORMot.
Who need Cookies can easy read/add a header.
I think in curl cookies cache is aded for apps witch can't store his context, may be for PHP in CGI mode (curl is primary HTTP lubrary in PHP)
Last edited by mpv (2021-02-02 17:35:34)
Offline
This is a wonderful thread.
Offline
Thanks @mpv.
I agree that the cookie cache must be disabled.
I would have problems in some requests if cookies were passed between requests.
Offline
Offline
Out of curiosity, why do you compile with -O1 in the script ?
I usually compile at -O3 on Linux at least, with no problem.
BTW I still had troubles with mORMot 2 curl intialization - there was incorrect naming and an endless recursive call.
Check https://github.com/synopse/mORMot2/commit/2d3830967d
Offline
Is it worth going past -O2? Did you see improvements from benchmarks? I recall reading fpc devel maillist and it was said it's not worth it because in some cases it could produce different results.
Edit: Or maybe that was said for -O4, I'm unsure atm.
Last edited by pvn0 (2021-02-10 13:13:00)
Offline
There is a noticeable difference in terms of code generation with -O3.
IIRC -O1 still use a lot of variables on the stack, and is less good at inlining.
I optimized the pascal source code so that it generates the most aggressive asm with -O3, and pass the regression tests.
-O4 is unsafe for sure, and documented as such.
Offline
I set -O1 optimization level while search why in my case tests are passed with errors in SynCurl and forgot to change back.
PR#11 updated:
- optimization level sets to -O3
- added ExitCode := 1 in mormot.core.test.pas in case some tests fails, so CI or shell script can detect tests failure
Offline
@ab, about commit https://synopse.info/fossil/info/0b2eb7fc77eb4d38 what fixes an AV with concurrent curl cache access - what version of libcurl is on Debian 10? (curl --version)
It looks like it's fixed in libcurl 7.68 - see https://github.com/curl/curl/pull/4557 and https://github.com/curl/curl/issues/4544
Last edited by mpv (2021-04-08 11:05:13)
Offline
About libcurl, I don't know my exact version because I switched to Debian 11 testing so now I am on 7.74.
It seems to be resolved on 7.72, not 7.68.
I have added a patch to check the version.
It is weird how poorly libcurl is tested - we identified the issue with mORMot directly thanks to its multi-thread tests.
Sidenote about OpenSSL.
I have tried to enable custom malloc/free for OpenSSL, but it seems that this library has memory leaks so when I run the tests with fpcx64mm memory leak reporting (which is my default) it reports some leaks.
Offline
I can confirm what on curl 7.68 (default in Ubuntu 20 / Debian 11) no problem exists.
On Deban 10 and a worst - on OEL 8 what used for production deployment in most enterprise level customers, where libcurl is 7.61.1 and can't be updated, problem exists and I reproduce it.
So check added by this commit is a good solution.
libcurl mostly used in PHP (single thread) so such cases may be not tested well. But library itself is good...
As far as I know OpenSSL is usually patched in paid linux distributive's, so MAY BE leaks are fixed there
Last edited by mpv (2021-04-09 10:22:01)
Offline
Pages: 1