#1 2021-01-31 15:39:25

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

TCurlHTTP problem on Hi-load (and solution)

In some of our productions we use a several external services by doing a HTTP request to them (from different server threads) and we discover some problems with TCurlHTTP.

In case number of requests increased (>300 RPS) very strange things are happens:
-  we got a ban from DNS server because of huge DNS request count
- when we solve a DNS problem by adding URL to local resolver - we running out of ephemeral ports
- tuning a kernel (here is an tuning example ) helps but when RPS increased to >500 RPS problems starts again

The solution is to reuse an existed instance of TCurlHTTP (per thread per URL) - in this case one easy handle is used inside and it hold a per handle a DNS, TLS and TCP cache.
But this solution require a massive changes in sources (we currently apply it to most critical parts and this solves out problems).

Tomorrow I found a good solution -  libcurl share interface, and implement it in pull request #376

@ab, please, review it and if everything is OK I can do the same changes for mormot2.

The only question - may be enable a share interface by default? From my tests there is no performance penalty on single thread and on reused TCurlHTTP instance with and without share interface.

Current implementation require to enable it manually in main some of program unit:

initialization
  if CurlIsAvailable then begin
    CurlEnableGlobalShare;
  end;
end.

P.S.
  Looks like TWinHTTP do some caching on library or kernel level (at last it reuse a TLS connection based on tests performance, we lack of something like strace on Windows to analyze deeper)

Last edited by mpv (2021-01-31 15:45:10)

Offline

#2 2021-02-01 07:17:27

pvn0
Member
From: Slovenia
Registered: 2018-02-12
Posts: 211

Re: TCurlHTTP problem on Hi-load (and solution)

Seems like your local resolver doesn't have caching enabled. It's very rare to get a ban from a DNS server (even for invalid queries), most probably you got rate-limited because of your requests/sec and that kind of soft-ban only lasts for a short time. If you can't get your caching fixed then you should make sure you don't exceed 5 requests per second, that's kinda the default norm, anything more then that and your requests will be considered suspicious.

Last edited by pvn0 (2021-02-01 07:25:36)

Offline

#3 2021-02-01 16:29:08

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,661
Website

Re: TCurlHTTP problem on Hi-load (and solution)

@mpv
I have merged - with some typo fixes - your pull request.
If you can, you can pull it to mORMot 2 too. wink

About enabling it by default, it could be a good idea.
I have enabled it https://synopse.info/fossil/info/9c8ad67c4a

Offline

#4 2021-02-01 17:33:45

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: TCurlHTTP problem on Hi-load (and solution)

@ab - thanks - I will prepare a patch for mORMot2.

@pvn0 - you are right - a local DNS resolver cache is miss-configured. But configure it property is not a trivial task. To get a local resolver cache we either should use a systemd-resolved, or install a local BIND(or somethng similar). Both options are not applicable inside a docker container, for example. Even in stock Ubuntu (20.04) in /etc/nsswitch.conf we see a record

hosts:          files mdns4_minimal [NOTFOUND=return] dns mymachines

so mdns4_minimal is used instead of systemd-resolved.
And such a things can happens on each of environment. So better to have an DNS cache on application level (in our case - on libcurl).

Last edited by mpv (2021-02-01 17:34:49)

Offline

#5 2021-02-02 13:45:36

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,661
Website

Re: TCurlHTTP problem on Hi-load (and solution)

I have just added a thread-safe TSynDictionary-based cache of DNS names for NewSocket().
Check https://github.com/synopse/mORMot2/commit/61b5381058
It will work for the raw TNetSocket connections.

Offline

#6 2021-02-02 15:22:18

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: TCurlHTTP problem on Hi-load (and solution)

Just one important remark - application level DNS cache should have an expire. Either constant, as  in libcurl (60 sec by default, see CURLOPT_DNS_CACHE_TIMEOUT) or based on TTL from DNS response( too complex to implement). Constant (property with ability to change it) is enough in 99% of cases.

This is because many many services uses DNS based load balancing, by returns a different IP for the same name with short TTL.  For examlle solutions based on hashicorp consul,  etc.

Offline

#7 2021-02-02 16:13:01

macfly
Member
From: Brasil
Registered: 2016-08-20
Posts: 374

Re: TCurlHTTP problem on Hi-load (and solution)

Thanks for the improvements!

Just to confirm, about libcurl share interface. I check the code twice and the cookies are not being shared, correct?

It is important that this behavior does not change.

Offline

#8 2021-02-02 16:22:21

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,661
Website

Re: TCurlHTTP problem on Hi-load (and solution)

@mpv
There was a 1 hour timeout of our DNS name cache.
Perhaps it is too big...
But 1 minute may really be too small to have most interest, no?

What I have just added is:
- set the default timeout to 10 minutes;
- a method to change the timeout value;
- flush the local cache entry on connection failure, to force call the DNS resolver again.

https://github.com/synopse/mORMot2/commit/5140023e92c9

Offline

#9 2021-02-02 17:34:09

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: TCurlHTTP problem on Hi-load (and solution)

@ab, now everithing looks fine, IMHO

@macfly,  about cookies cache in libcurl shared cache.
Yes, cookies cache is not enabled. And I do not see any reason to enable it for mORMot.
Who need Cookies can easy read/add a header.
I think in curl cookies cache is aded for apps witch can't store his context, may be for PHP in CGI mode (curl is primary HTTP lubrary in PHP)

Last edited by mpv (2021-02-02 17:35:34)

Offline

#10 2021-02-02 17:39:34

Junior/RO
Member
Registered: 2011-05-13
Posts: 210

Re: TCurlHTTP problem on Hi-load (and solution)

This is a wonderful thread.

Offline

#11 2021-02-02 18:10:46

macfly
Member
From: Brasil
Registered: 2016-08-20
Posts: 374

Re: TCurlHTTP problem on Hi-load (and solution)

Thanks @mpv.

I agree that the cookie cache must be disabled.

I would have problems in some requests if cookies were passed between requests.

Offline

#12 2021-02-08 19:53:07

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: TCurlHTTP problem on Hi-load (and solution)

Shared interface is back-ported into mORMot2 see #10

Offline

#13 2021-02-08 20:43:13

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,661
Website

Re: TCurlHTTP problem on Hi-load (and solution)

I have merged it, and made a quick fix.

Thanks for sharing!

Offline

#14 2021-02-08 20:51:57

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: TCurlHTTP problem on Hi-load (and solution)

Ups. Looks like TCurlHTTP not used in mormot2test. I thought the REST tests work using curl under Linux  (I run tests before doing PR)

Offline

#15 2021-02-09 07:49:34

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,661
Website

Re: TCurlHTTP problem on Hi-load (and solution)

The fix was that it didn't compile at all, not that it wasn't used.

Offline

#16 2021-02-09 12:50:52

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: TCurlHTTP problem on Hi-load (and solution)

Sorry, my mistake - build_fpc.sh do not cleanup a .dcu folder correctly. Fixed by PR #11.
Also added cross compiling on Linux. Linux64 -> win64 cross compile call:

TARGET=win64 ./build_fpc.sh

BTW without this fix fpc compile tests successfully..

Last edited by mpv (2021-02-09 12:53:27)

Offline

#17 2021-02-09 20:59:06

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,661
Website

Re: TCurlHTTP problem on Hi-load (and solution)

Out of curiosity, why do you compile with -O1 in the script ?
I usually compile at -O3 on Linux at least, with no problem.

BTW I still had troubles with mORMot 2 curl intialization - there was incorrect naming and an endless recursive call.
Check https://github.com/synopse/mORMot2/commit/2d3830967d

Offline

#18 2021-02-10 08:24:32

pvn0
Member
From: Slovenia
Registered: 2018-02-12
Posts: 211

Re: TCurlHTTP problem on Hi-load (and solution)

Is it worth going past -O2? Did you see improvements from benchmarks? I recall reading fpc devel maillist and it was said it's not worth it because in some cases it could produce different results.

Edit: Or maybe that was said for -O4, I'm unsure atm.

Last edited by pvn0 (2021-02-10 13:13:00)

Offline

#19 2021-02-10 15:44:04

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,661
Website

Re: TCurlHTTP problem on Hi-load (and solution)

There is a noticeable difference in terms of code generation with -O3.
IIRC -O1 still use a lot of variables on the stack, and is less good at inlining.
I optimized the pascal source code so that it generates the most aggressive asm with -O3, and pass the regression tests.

-O4 is unsafe for sure, and documented as such.

Offline

#20 2021-02-10 17:09:37

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: TCurlHTTP problem on Hi-load (and solution)

I set -O1 optimization level while search why in my case tests are passed with errors in SynCurl and forgot to change back.
PR#11 updated:
- optimization level sets to -O3
- added ExitCode := 1 in mormot.core.test.pas in case some tests fails, so CI or shell script can detect tests failure

Offline

#21 2021-04-08 11:04:23

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: TCurlHTTP problem on Hi-load (and solution)

@ab, about commit https://synopse.info/fossil/info/0b2eb7fc77eb4d38 what fixes an AV with concurrent curl cache access - what version of libcurl is on Debian 10? (curl --version)
It looks like it's fixed in libcurl 7.68 - see https://github.com/curl/curl/pull/4557 and https://github.com/curl/curl/issues/4544

Last edited by mpv (2021-04-08 11:05:13)

Offline

#22 2021-04-09 07:25:07

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,661
Website

Re: TCurlHTTP problem on Hi-load (and solution)

About libcurl, I don't know my exact version because I switched to Debian 11 testing so now I am on 7.74.
It seems to be resolved on 7.72, not 7.68.
I have added a patch to check the version.
It is weird how poorly libcurl is tested - we identified the issue with mORMot directly thanks to its multi-thread tests.

Sidenote about OpenSSL.
I have tried to enable custom malloc/free for OpenSSL, but it seems that this library has memory leaks so when I run the tests with fpcx64mm memory leak reporting (which is my default) it reports some leaks.

Offline

#23 2021-04-09 10:21:44

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: TCurlHTTP problem on Hi-load (and solution)

I can confirm what on curl 7.68 (default in Ubuntu 20 / Debian 11) no problem exists.
On Deban 10 and a worst - on OEL 8 what used for production deployment  in most enterprise level customers, where libcurl is 7.61.1 and can't be updated, problem exists and I reproduce it.
So check added by this commit is a good solution.

libcurl mostly used in PHP (single thread) so such cases may be not tested well. But library itself is good...

As far as I know OpenSSL is usually patched in paid linux distributive's, so MAY BE leaks are fixed there

Last edited by mpv (2021-04-09 10:22:01)

Offline

#24 2021-04-09 10:35:21

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,661
Website

Re: TCurlHTTP problem on Hi-load (and solution)

Good to know.

Another remark: I have seen better performance of the mormot libcurl client when the RTL MM is used - at least with fpcx64mm. smile

Offline

Board footer

Powered by FluxBB