Fast graceful shutdown of ThreadPerTaskExecutor (when expected WAITING Threads)
Jens Lideström
jens.lidestrom at fripost.org
Tue Dec 20 13:54:13 UTC 2022
Hello,
Random passer-by here.
I don't think this has anything to do with Loom and virtual threads.
The way to interrupt sockets has always been to close them. Then blocking methods will throw an exception.
See this, for example: https://stackoverflow.com/questions/4425350/how-to-terminate-a-thread-blocking-on-socket-io-operation-instantly
Nima seems to do that for the server socket, but it FIRST waits for the ExecutorService to shutdown:
https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/ServerListener.java#L137
I think Nima should close the socket FIRST, then shutdown the ExecutorService.
(I have not checked how Nima handles the client sockets. They should be treated in the same way.)
Having written the above it strikes me that maybe you aware of this already? Maybe you are suggesting the addition of some new kind of application independent interruption mechanism that are honoured by InputStreams from Sockets? I don't think this is related to Loop, either.
Best regards,
Jens Lideström
On 2022-12-19 22:51, Rob Bygrave wrote:
> I have been looking at the shutdown process of Helidon Nima web server which makes use of:
>
> Executors.newThreadPerTaskExecutor(Thread.ofVirtual()
> .allowSetThreadLocals(true)
> .inheritInheritableThreadLocals(false)
> .factory());
>
> Firstly, there is no issue with Nima web server shutdown when using HTTP 1.0 no keepalive. The virtual threads for this case live for the length of a single request/response only.
>
> Where I am hitting a question/issue is with Nima web server shutdown when using HTTP 1.1 keepalive true (and there is at least 1 connection being kept alive). What happens with HTTP 1.1 with keepalive true is that there is 1 virtual thread per connection (slight simplification). After the request/response has been processed the virtual thread then looks to read the next request. When there are no more requests coming into the web server we see this thread is WAITING looking to read the first part of the next request.
>
> The thread is WAITING "while reading the prologue" here:
> https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/http1/Http1Connection.java#L115 <https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/http1/Http1Connection.java#L115>
>
> Conceptually when the Nima web server is "idle" we expect to be able to shutdown the web server gracefully (allowing for active requests to complete) and quickly. In this "idle" state with HTTP 1.1 keepalive connections the ThreadPerTaskExecutor contains alive threads that are in WAITING state while "reading the prologue of the next request".
>
> Currently the webserver.stop() ends up as the usual:
>
> executorService.shutdown();
> if (!executorService.awaitTermination(...)) {
> executorService.shutdownNow()
> }
>
> Where the executorService in question is ThreadPerTaskExecutor. The above shutdown does not execute in a timely manner based on the timeout used for executorService.awaitTermination(...) - for example if this timeout is 10 seconds we get a pretty slow shutdown on a conceptually idle web server.
>
>
> *Some thoughts*
> An approach would be as part of [ThreadPerTaskExecutor].shutdown() to firstly look to interrupt threads that are state == WAITING && some-application-specific-logic-that-says-the-task-is-interruptible (which for Nima is that the task is readPrologue() at Http1Connection.java#L115).
>
> e.g. Perhaps have a interface InterruptableTask extends Runnable ... and on shutdown() firstly iterate the active threads that are WAITING and if their tasks are InterruptableTask and application-specific-logic-that-says-the-task-is-interruptible true then interrupt that thread.
>
> I note there is some non-public API like Stream<Thread> threads() which applications can't use (at least yet). My current hacking around with this shutdown in Helidon hasn't given me any elegant solutions in application logic in that we need to now have the application additionally hold the Runnable tasks (which is not ideal in the http 1.0 no keepalive case).
>
>
> Does anyone have some thoughts or suggestions on this?
>
>
> I have raised this as an issue in Helidon as: https://github.com/helidon-io/helidon/issues/5717 <https://github.com/helidon-io/helidon/issues/5717>
> with a related PR with failing tests. I'm raising this here because my current thought is that an elegant solution might be with ThreadPerTaskExecutor on shutdown or some exposure of the waiting threads accessible to application code.
>
>
> Thanks, Rob.
>
More information about the loom-dev
mailing list