You are not logged in.
hello,
how many worker threads does mOrMot + http.sys use ? Globally how the threads are managed with mOrMot + http.sys ?
thanks by advance
Offline
Thanks @AB. I see ServerThreadPoolCount is set to 32. Does that mean 32 threads are created by default? If there are, say, 42 concurrent requests, will the extra 10 wait for a free thread, or am I misunderstanding?
Last edited by loki (Yesterday 21:41:42)
Offline
For 42 concurrent processing request, 10 will have to wait, because there will be 32 threads.
But you can have thousands of concurrent idle requests.
Anyway, process should be CPU bound.
What do you need?
If you expect the request to take a lot of time then you need another kind of server.
The async server is able to delay its answers (as we do with the TFB sample e.g. waiting for database answers), but it is tricky.
The socket server will use a thread pool for HTTP/1.0 requests, then create one thread per HTTP/1.1 client, so it will allow to have as many concurrent requests as you need - but with the cost of one thread per client.
If your request take a lot of time, create your own processing thread, then poll from the client side to see if it is finished, or use a WebSockets callback.
Offline
I get the point about 32 processing threads and thousands of idle connections being fine when the workload is truly CPU-bound. In my case, though, most endpoints perform I/O-bound work (e.g., calling external HTTP APIs or a DB over TCP). The latency is primarily waiting on upstream, not burning CPU. So a fixed cap of 32 worker threads means that, with 42 in-flight requests waiting on upstream, 10 will indeed queue up even though the machine is mostly idle.
I have a few questions:
* Handling I/O-bound work: How do you recommend handling these cases? Can the thread pool auto-scale when many handlers are waiting on external I/O? And can mORMot use IOCP when running with HTTP.SYS?
* Default thread count: Why is the default 32 threads? If the workload were purely CPU-bound (no blocking I/O—which is uncommon since there’s almost always at least a DB call), wouldn’t it make more sense to default to number-of-processors instead?
Thanks!
Last edited by loki (Today 07:59:06)
Offline
"Can the thread pool auto-scale"
> I don't think it is a good idea - a thread pool is a pool, so auto-adjusting its size defeats its own purpose
"can mORMot use IOCP when running with HTTP.SYS"
> No, the HTTP.SYS server is blocking
"How do you recommend handling these cases"
> Use a regular THttpServer with one thread per client - simple and easy
> Another option may be to use the THttpAsyncServer (which uses IOCP on Windows) and its async/delayed feature - see the TFB sample - which would scale perfectly but it is not simple to implement, and currently only our direct PostgreSQL client unit supports async DB too.
"Why is the default 32 threads"
> Because it is a default, which consumes only a few system resources and scales well in mixed CPU/IO process - but users are likely to adapt to their workload, and use SystemInfo.dwNumberOfProcessors as we do in the TFB sample
Offline
Thanks @AB. Do you think it would be hard for me to update your code so that your http.sys implementation works with IOCP? Honestly, I can’t imagine any API server that won’t at least make a TCP request (most often a database request), so updating your implementation to IOCP could be a great improvement (I think). Do you see any problems I might run into, or any good reason not to do this?
Offline
In practice, most DB request should take less than 1 millisecond, unless there is something wrong with the context.
mORMot is used with this pattern with no problem and adding async responses to http.sys won't help, but introduce more confusion to the code. Just look at the code.
And in practice, from the TFB result themselves, the blocking code is as fast, or even faster than the non-blocking code, once you define enough threads for the mORMot async servers on Linux.
Windows will always be behind, especially http.sys which is not a so good pattern from my tests.
The pascal language is not easy to work with delegates, and delayed execution.
The TFB sample is a sample how we could achieve this.
Offline
I see, thanks @AB !
Offline