[External] : Re: Virtual thread memory leak because TrackingRootContainer keeps threads

Thu Jul 25 12:21:08 UTC 2024

Hmm, I'm starting to think I may have fallen into the same trap here as
Michal.

I've been using virtual threads similar to platform threads for performing
IO tasks asynchronously.

Background: originally, I was using
Executors#newVirtualThreadPerTaskExecutor to run these tasks, which was
fine because the Executor tracks the virtual threads it creates in order to
support shutdown. However, from an observability point of view I've found
the executor to be a bit frustrating because it is impossible to set the
thread name before the thread is scheduled to run[1]. This means that under
heavy load where the FJ pool is busy a thread dump shows many unnamed
threads, which are waiting to be scheduled for the first time. Admittedly,
this is only a minor annoyance because I'm more interested in the threads
that are clogging up the FJ pool than the ones which are waiting to use it,
even so it'd be nice to have an overall picture of what's active and what's
queued (note also thread names are included in JFR events, which is super
helpful).

To remedy this, I've switched away from using an Executor and now I just
use "Thread.ofVirtual().name(initialName).start(task)". However, I don't
think all of the tasks are strongly reachable - some are "fire and forget"
tasks (e.g. async resource cleanup), so I may be inadvertently relying on
the JVM's observability support to keep these tasks alive until they
complete, which seems a bit brittle. In fact, now that I think of it, it
may not even be limited to fire and forget tasks. A VT that is reading
messages from a Socket could be GC'd IIUC since, I'm not maintaining
references to the reader VT itself:

* application creates/accepts Socket
* application creates reader VT referencing the socket and starts the
thread but doesn't keep a reference to the thread
* reader VT terminates when either client closes the socket (EOS) or when
the application closes the socket triggering an exception in the reader VT.

In other words, there are strong references to the task and its state (e.g.
Socket), but there is no strong reference to the task itself AFAICS, nor do
I need one really, since the thread lifecycle is managed indirectly. I fear
I may have a similar problem for VTs that are executing client requests - I
maintain strong references to the task's state (e.g. request), but not the
VT itself.

Am I misunderstanding?

Thanks :-)

[1] VTs make thread naming very lightweight, which is great for debugging.
For example, I've included connection information, a summary of request
parameters and even updated the thread name to include internal routing
information, DB index names, records and keys (making it easy to identify
contended DB keys).

On Tue, 23 Jul 2024 at 16:57, Ron Pressler <ron.pressler at oracle.com> wrote:

>
>
> > On 22 Jul 2024, at 22:51, Michal Domagala <outsider404 at gmail.com> wrote:
> >
> > Thanks for the explanation.
> >
> > I understand that paragraph
> >
> > "Unlike platform thread stacks, virtual thread stacks are not GC roots.
> Thus the references they contain are not traversed in a stop-the-world
> pause by garbage collectors, such as G1, that perform concurrent heap
> scanning"
> >
> > can be rewritten as
> >
> > "Some GC, such as G1, marks GC roots in stop-the-world pause. Unlike
> platform thread stacks, virtual thread stacks are not GC roots, therefore
> they do not impact stop-the-world pause."
> >
> > In my opinion, the current paragraph in JEP 444 requires readers to have
> a deep GC background. Usually, developers are not aware of GC root cost (at
> least I was not aware). Developers could tune the number of GC roots by
> changing the number of platform threads. Others, like static variables, are
> rather not tunable.
> > But usually, the OS limits the number of platform threads much more
> strictly than GC performance.
> >
> > To sum up, JEP 444's message is: "Do not be afraid of the G1 initial
> mark phase when using virtual threads". But I think most developers, like
> me, never heard about it. Ohers, more advanced, could also never care about
> it, because Oracle docs says about the initial mark phase: "This phase is
> piggybacked on a normal (STW) young garbage collection.". I understand this
> sentence as the phase is "for free".
> >
> > To sum up again: when a developer like me reads that VT is not GC root,
> he does not see G1 profit behind. He reads: VT is GC'able. And the current
> state, when behavior is different, is misleading.
>
> A virtual thread *is* GCable, just like a String, when it is not strongly
> referenced. However, by default virtual threads will have a strong
> reference for observability, but you can turn that off.
>
> But, yes, the very notion of GC roots requires an advanced understanding
> of how the Java platform’s GCs work. I *think* that the very notion of GC
> roots is not part of the spec, but an implementation detail. No inference
> can be made, for any object, from whether or not it is a root, to when it
> will be collected. In fact, the platform’s GC make no guarantees as to when
> objects are collected, regardless of whether they’re roots or not. The only
> guarantee is that an object will not be collected if it is strongly
> reachable.
>
> — Ron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240725/2aa1bcfa/attachment-0001.htm>