Fast graceful shutdown of ThreadPerTaskExecutor (when expected WAITING Threads)
Rob Bygrave
robin.bygrave at gmail.com
Tue Dec 20 21:01:21 UTC 2022
*> The way to interrupt sockets has always been to close them*
We are looking to achieve "Graceful shutdown" which means it allows
in-flight requests to complete before shutting down. We don't want to just
nuke em or said differently, it is the job of
executorService.awaitTermination(...) executorService.shutdownNow() to nuke
anything still active after the timeout.
... yes we want to interrupt Virtual Threads that are WAITING and known to
be *not in-flight* *BEFORE* executorService.awaitTermination(...)
*> anything to do with Loom and virtual threads.*
Maybe. There are a few things that make me ask though and I'm in particular
looking at ThreadPerTaskExecutor. One is that all the existing Java http
servers that I know of use NIO (as you would expect) and specifically that
means "getting the next request" for http 1.1 keepalive true is an *event*
where as with Virtual Threads and Nima "getting the next request" is a
Virtual Thread blocking on IO waiting for those next bytes to arrive (which
to me is the "new normal" approach that Virtual Threads brings).
That is, with Virtual Threads I am guessing that it's potentially a "new
normal" pattern where we are looking to shutdown gracefully a
ThreadPerTaskExecutor which is holding active Virtual Threads that are
blocking waiting on IO (but not yet deemed in-flight). I'm suggesting that
ThreadPerTaskExecutor currently does not make it nice / easy / friendly to
shutdown gracefully allowing for in-flight requests.
I see ThreadPerTaskExecutor.shutdownNow() always returns an empty list of
Runnable (this is too late in the shutdown process anyway so not useful per
say but we are trying to access the Stream<Runnable>). I see the internal
ThreadContainer and the internal Stream<Thread> threads() method but I
don't see anything available to application code to assist the shutdown of
a ThreadPerTaskExecutor for this case which I feel might become a common
pattern going forward.
Cheers, Rob.
On Wed, 21 Dec 2022 at 05:38, Jens Lideström <jens.lidestrom at fripost.org>
wrote:
> Hello,
>
> Random passer-by here.
>
> I don't think this has anything to do with Loom and virtual threads.
>
> The way to interrupt sockets has always been to close them. Then blocking
> methods will throw an exception.
>
> See this, for example:
> https://stackoverflow.com/questions/4425350/how-to-terminate-a-thread-blocking-on-socket-io-operation-instantly
>
> Nima seems to do that for the server socket, but it FIRST waits for the
> ExecutorService to shutdown:
>
>
> https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/ServerListener.java#L137
>
> I think Nima should close the socket FIRST, then shutdown the
> ExecutorService.
>
> (I have not checked how Nima handles the client sockets. They should be
> treated in the same way.)
>
> Having written the above it strikes me that maybe you aware of this
> already? Maybe you are suggesting the addition of some new kind of
> application independent interruption mechanism that are honoured by
> InputStreams from Sockets? I don't think this is related to Loop, either.
>
> Best regards,
> Jens Lideström
>
>
> On 2022-12-19 22:51, Rob Bygrave wrote:
> > I have been looking at the shutdown process of Helidon Nima web server
> which makes use of:
> >
> > Executors.newThreadPerTaskExecutor(Thread.ofVirtual()
> > .allowSetThreadLocals(true)
> > .inheritInheritableThreadLocals(false)
> > .factory());
> >
> > Firstly, there is no issue with Nima web server shutdown when using HTTP
> 1.0 no keepalive. The virtual threads for this case live for the length of
> a single request/response only.
> >
> > Where I am hitting a question/issue is with Nima web server shutdown
> when using HTTP 1.1 keepalive true (and there is at least 1 connection
> being kept alive). What happens with HTTP 1.1 with keepalive true is that
> there is 1 virtual thread per connection (slight simplification). After the
> request/response has been processed the virtual thread then looks to read
> the next request. When there are no more requests coming into the web
> server we see this thread is WAITING looking to read the first part of the
> next request.
> >
> > The thread is WAITING "while reading the prologue" here:
> >
> https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/http1/Http1Connection.java#L115
> <
> https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/http1/Http1Connection.java#L115
> >
> >
> > Conceptually when the Nima web server is "idle" we expect to be able to
> shutdown the web server gracefully (allowing for active requests to
> complete) and quickly. In this "idle" state with HTTP 1.1 keepalive
> connections the ThreadPerTaskExecutor contains alive threads that are in
> WAITING state while "reading the prologue of the next request".
> >
> > Currently the webserver.stop() ends up as the usual:
> >
> > executorService.shutdown();
> > if (!executorService.awaitTermination(...)) {
> > executorService.shutdownNow()
> > }
> >
> > Where the executorService in question is ThreadPerTaskExecutor. The
> above shutdown does not execute in a timely manner based on the timeout
> used for executorService.awaitTermination(...) - for example if this
> timeout is 10 seconds we get a pretty slow shutdown on a conceptually idle
> web server.
> >
> >
> > *Some thoughts*
> > An approach would be as part of [ThreadPerTaskExecutor].shutdown() to
> firstly look to interrupt threads that are state == WAITING &&
> some-application-specific-logic-that-says-the-task-is-interruptible (which
> for Nima is that the task is readPrologue() at Http1Connection.java#L115).
> >
> > e.g. Perhaps have a interface InterruptableTask extends Runnable ... and
> on shutdown() firstly iterate the active threads that are WAITING and if
> their tasks are InterruptableTask and
> application-specific-logic-that-says-the-task-is-interruptible true then
> interrupt that thread.
> >
> > I note there is some non-public API like Stream<Thread> threads() which
> applications can't use (at least yet). My current hacking around with this
> shutdown in Helidon hasn't given me any elegant solutions in application
> logic in that we need to now have the application additionally hold the
> Runnable tasks (which is not ideal in the http 1.0 no keepalive case).
> >
> >
> > Does anyone have some thoughts or suggestions on this?
> >
> >
> > I have raised this as an issue in Helidon as:
> https://github.com/helidon-io/helidon/issues/5717 <
> https://github.com/helidon-io/helidon/issues/5717>
> > with a related PR with failing tests. I'm raising this here because my
> current thought is that an elegant solution might be with
> ThreadPerTaskExecutor on shutdown or some exposure of the waiting threads
> accessible to application code.
> >
> >
> > Thanks, Rob.
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20221221/06ddaba8/attachment-0001.htm>
More information about the loom-dev
mailing list