EA builds with changes to object monitor implementation to avoid pinning with virtual threads

Sat Feb 17 12:32:03 UTC 2024

Thanks Alan for the detailed explanation. This reminds me of Java 21
deadlock scenarios when obtaining a lock both outside and inside a
synchronized block by some number of threads greater than machine cores,
could potentially cause deadlock.

I still have one question though. This deadlock scenario (a burst of
virtual threads at startup with a mix of class loading (which comes with
pinning) and resource loading from the same JAR files) is not happening on
21 and latest 23-ea. I suspect because Forkjoinpool creates more thread to
compensate for pinning? Would you please shed some light on this?

And is there any way we can have that behaviour reintroduced into loom only
for class loading scenarios as a temporary workaround until the fundamental
work for classloading is done? Or is there anything at the developer's side
that can be done to avoid this? The thing is that with 21 and 23-ea, at
least we could opt for implementations that favour Reentrantlock over
synchronized (even though a very painful and time consuming approach) and
have a working setup, but with your build it is almost impossible to
survive the load.

I am not undermining the loom team's work at all. It's a great milestone.
And we definitely wanna help the team by testing it and I personally would
love to have your work in 23 ea builds sooner than later. But the testing
itself can't be accomplished because of deadlock. So it sounds like a
chicken egg problem to me. I am looking forward to hearing your opinion on
this.

Kind regards,
Masoud

On Sat, Feb 17, 2024 at 11:20 AM Alan Bateman <Alan.Bateman at oracle.com>
wrote:

> On 16/02/2024 19:58, masoud parvari wrote:
>
> Hi Alan,
>
> About deadlock on Java 21 while serving static contents (which is resolved
> on your build), I deep dived a bit. You are right. The culprit is most
> probably *not File I/O*. What *Spring-MVC* does is that it *caches* from
> which location (out of multiple available candidates) it eventually manages
> to resolve the static resource and then it proceeds to do *Classloader.getResourceAsStream()
> *to get the file. The cache implementation is backed by
> *ConcurrentHashMap* and it calls *put(k,v) *method on the map which
> involves *synchronized block. *I just didn't understand how it can happen
> even with very few concurrent requests.
>
> Thanks for instructing me to use *jcmd* and yes it's a *12 core* machine.
> I ran the test again and got 2 thread dumps. One from *jvisualvm* and one
> from *jcmd* so you can co-relate them. Please find them attached.
> It's a deadlock on classloader. 11 out of 12 carrier threads are block on
> a *synchronised block* at *java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:651)
> *and the other one which is
> *virtual thread #120 (forkjoinpool-1-worker-14) *, is stuck in a *synchronized
> block* at *java.base/java.util.zip.ZipFile.getEntry(ZipFile.java:339).*
>
>
> Thanks for sharing the thread dumps. We can see 300 virtual threads. 12
> are blocked trying to enter a monitor but are pinned due to native frames
> on the stack. No other threads can run as a result. You won't see these
> native frames in the stack traces but essentially all 12 are in
> nl.trifork.qti.model.processing.expression.general.BaseValue's constructor
> and triggering a class load, which goes through the VM. Of the 12, 11 are
> blocked at BuiltinClassLoader.loadClassOrNull as you pointed out.  The
> built-in class loaders are "parallel capable" but they do contend when
> several threads are attempting to load the same class at the same time. As
> you found, one of the 12, #120 has got further but it blocks as a later
> point due to other threads (#119 and #123) trying to locate resources in
> the same JAR file. I think we can assume that one of these two has been
> unblocked, meaning scheduled to continue, but can't continue as there are
> no carriers available. If you run `jcmd <pid> Thread.vthread_scheduler` a
> few times when this happens then you'll see the counters stall.
>
> I agree this is unfortunate, and not easy to avoid. It's essentially a
> burst of virtual threads at startup with a mix of class loading (which
> comes with pinning) and resource loading from the same JAR files. Right
> now, the focus is the pain point of object monitors but class loading is
> something that does need attention too.
>
> -Alan.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240217/8db40b71/attachment-0001.htm>