Fast graceful shutdown of ThreadPerTaskExecutor (when expected WAITING Threads)

Tue Dec 20 22:51:50 UTC 2022

*> Rob, why are you not using StructuredTaskScope instead of
Executors.newThreadPerTaskExecutor?*

I'm not in the Helidon team nor do I work for Oracle so I'm not the correct
person to ask. I have seen commented out *StructuredTaskScope *code in Nima
though so there is possibly a commit comment to find. I suspect it's just
the case that Nima is building against the released JDK that does not
include *StructuredTaskScope* yet?

*> Basically, is there something you know that I don't know?*

Unlikely :)  To me the use of *Executors.newThreadPerTaskExecutor *does
look appropriate (with just this issue around graceful shutdown which is
important to me due to how it impacts Nima running in Kubernetes)
*.*

*> who has been playing with Nima*

I don't really want to side track this topic too much. I'll add my Nima
comments afterwards.

Cheers, Rob.

On Wed, 21 Dec 2022 at 10:10, Eric Kolotyluk <eric at kolotyluk.net> wrote:

> As another curious casual observer, who has been playing with Nima...
>
> Rob, why are you not using StructuredTaskScope instead of
> Executors.newThreadPerTaskExecutor?
>
> Basically, is there something you know that I don't know?
>
> Cheers, Eric
> On 2022-12-20 1:01 p.m., Rob Bygrave wrote:
>
> *> The way to interrupt sockets has always been to close them*
>
> We are looking to achieve "Graceful shutdown" which means it allows
> in-flight requests to complete before shutting down.  We don't want to just
> nuke em or said differently, it is the job of
> executorService.awaitTermination(...) executorService.shutdownNow() to nuke
> anything still active after the timeout.
>
> ... yes we want to interrupt Virtual Threads that are WAITING and known to
> be *not in-flight* *BEFORE* executorService.awaitTermination(...)
>
>
> *> anything to do with Loom and virtual threads.*
>
> Maybe. There are a few things that make me ask though and I'm in
> particular looking at ThreadPerTaskExecutor. One is that all the existing
> Java http servers that I know of use NIO (as you would expect) and
> specifically that means "getting the next request" for http 1.1 keepalive
> true is an *event* where as with Virtual Threads and Nima "getting the
> next request" is a Virtual Thread blocking on IO waiting for those next
> bytes to arrive (which to me is the "new normal" approach that Virtual
> Threads brings).
>
> That is, with Virtual Threads I am guessing that it's potentially a "new
> normal" pattern where we are looking to shutdown gracefully a
> ThreadPerTaskExecutor which is holding active Virtual Threads that are
> blocking waiting on IO (but not yet deemed in-flight).  I'm suggesting that
> ThreadPerTaskExecutor currently does not make it nice / easy / friendly to
> shutdown gracefully allowing for in-flight requests.
>
> I see ThreadPerTaskExecutor.shutdownNow() always returns an empty list of
> Runnable (this is too late in the shutdown process anyway so not useful per
> say but we are trying to access the Stream<Runnable>). I see the internal
> ThreadContainer and the internal Stream<Thread> threads() method but I
> don't see anything available to application code to assist the shutdown of
> a ThreadPerTaskExecutor for this case which I feel might become a common
> pattern going forward.
>
>
>
> Cheers, Rob.
>
>
> On Wed, 21 Dec 2022 at 05:38, Jens Lideström <jens.lidestrom at fripost.org>
> wrote:
>
>> Hello,
>>
>> Random passer-by here.
>>
>> I don't think this has anything to do with Loom and virtual threads.
>>
>> The way to interrupt sockets has always been to close them. Then blocking
>> methods will throw an exception.
>>
>> See this, for example:
>> https://stackoverflow.com/questions/4425350/how-to-terminate-a-thread-blocking-on-socket-io-operation-instantly
>>
>> Nima seems to do that for the server socket, but it FIRST waits for the
>> ExecutorService to shutdown:
>>
>>
>> https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/ServerListener.java#L137
>>
>> I think Nima should close the socket FIRST, then shutdown the
>> ExecutorService.
>>
>> (I have not checked how Nima handles the client sockets. They should be
>> treated in the same way.)
>>
>> Having written the above it strikes me that maybe you aware of this
>> already? Maybe you are suggesting the addition of some new kind of
>> application independent interruption mechanism that are honoured by
>> InputStreams from Sockets? I don't think this is related to Loop, either.
>>
>> Best regards,
>> Jens Lideström
>>
>>
>> On 2022-12-19 22:51, Rob Bygrave wrote:
>> > I have been looking at the shutdown process of Helidon Nima web server
>> which makes use of:
>> >
>> > Executors.newThreadPerTaskExecutor(Thread.ofVirtual()
>> >    .allowSetThreadLocals(true)
>> >    .inheritInheritableThreadLocals(false)
>> >    .factory());
>> >
>> > Firstly, there is no issue with Nima web server shutdown when using
>> HTTP 1.0 no keepalive. The virtual threads for this case live for the
>> length of a single request/response only.
>> >
>> > Where I am hitting a question/issue is with Nima web server shutdown
>> when using HTTP 1.1 keepalive true (and there is at least 1 connection
>> being kept alive). What happens with HTTP 1.1 with keepalive true is that
>> there is 1 virtual thread per connection (slight simplification). After the
>> request/response has been processed the virtual thread then looks to read
>> the next request. When there are no more requests coming into the web
>> server we see this thread is WAITING looking to read the first part of the
>> next request.
>> >
>> > The thread is WAITING "while reading the prologue" here:
>> >
>> https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/http1/Http1Connection.java#L115
>> <
>> https://github.com/helidon-io/helidon/blob/main/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/http1/Http1Connection.java#L115
>> >
>> >
>> > Conceptually when the Nima web server is "idle" we expect to be able to
>> shutdown the web server gracefully (allowing for active requests to
>> complete) and quickly. In this "idle" state with HTTP 1.1 keepalive
>> connections the ThreadPerTaskExecutor contains alive threads that are in
>> WAITING state while "reading the prologue of the next request".
>> >
>> > Currently the webserver.stop() ends up as the usual:
>> >
>> > executorService.shutdown();
>> > if (!executorService.awaitTermination(...)) {
>> >    executorService.shutdownNow()
>> > }
>> >
>> > Where the executorService in question is ThreadPerTaskExecutor. The
>> above shutdown does not execute in a timely manner based on the timeout
>> used for executorService.awaitTermination(...) - for example if this
>> timeout is 10 seconds we get a pretty slow shutdown on a conceptually idle
>> web server.
>> >
>> >
>> > *Some thoughts*
>> > An approach would be as part of [ThreadPerTaskExecutor].shutdown() to
>> firstly look to interrupt threads that are state == WAITING &&
>> some-application-specific-logic-that-says-the-task-is-interruptible (which
>> for Nima is that the task is readPrologue() at Http1Connection.java#L115).
>> >
>> > e.g. Perhaps have a interface InterruptableTask extends Runnable ...
>> and on shutdown() firstly iterate the active threads that are WAITING and
>> if their tasks are InterruptableTask and
>> application-specific-logic-that-says-the-task-is-interruptible true then
>> interrupt that thread.
>> >
>> > I note there is some non-public API like Stream<Thread> threads() which
>> applications can't use (at least yet). My current hacking around with this
>> shutdown in Helidon hasn't given me any elegant solutions in application
>> logic in that we need to now have the application additionally hold the
>> Runnable tasks (which is not ideal in the http 1.0 no keepalive case).
>> >
>> >
>> > Does anyone have some thoughts or suggestions on this?
>> >
>> >
>> > I have raised this as an issue in Helidon as:
>> https://github.com/helidon-io/helidon/issues/5717 <
>> https://github.com/helidon-io/helidon/issues/5717>
>> > with a related PR with failing tests. I'm raising this here because my
>> current thought is that an elegant solution might be with
>> ThreadPerTaskExecutor on shutdown or some exposure of the waiting threads
>> accessible to application code.
>> >
>> >
>> > Thanks, Rob.
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20221221/f2f0d511/attachment-0001.htm>