RFR: 8333005: Deadlock when setting or updating the inline cache

Y. Srinivas Ramakrishna ysr at openjdk.org
Sat Jun 1 00:00:06 UTC 2024


On Wed, 29 May 2024 07:32:55 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

> In our concurrently class unloading collectors (ZGC, Generational ZGC, Shenandoah), there is a per-nmethod lock. This lock is used to protect the nmethod oops, and is used by nmethod entry barriers and for lazily computing the value of is_unloading, for example. It is also used to protect some other random stuff, and it is also used to protect the state machine of inline caches, which is otherwise completely orthogonal to any of the GC stuff.
> 
> Because the lock is used to protect the inline caches (is taken by the CompiledICLocker), you are not allowed to call is_unloading() on *other* nmethods while holding it. Because when we need access to the oops to compute is_unloading, we need to take the nmethod lock. So if two nmethods have inline caches pointing at each other, and calls are resolved at the same time, while concurrent class unloading is going on, we can sometimes get a deadlock.
> 
> I accidentally introduced such a bug when I removed the ICStubs (https://bugs.openjdk.org/browse/JDK-8322630), where this has indeed been observed.
> 
> The intention with this patch is to make the system less fragile. While it's possible to move around the call to is_unloading() to get rid of the deadlocks, I think I will sleep better at night knowing that you can call is_unloading() anywhere, at least in the shared runtime code, without knowing the GC implementation details. So I'm adding a per-nmethod inline cache lock that protects the completely orthogonal inline cache state for the CompiledICLocker. This way these deadlocks can't happen.
> 
> Tested ZGC tests tier1-7, and it looks green. The reproducer that caught the problem, also has stopped reproducing.

Thanks for the fix! Would it be possible to add to the associated ticket a stack retrace of the deadlocked process with the reproducer before the fix, and some information about the reproducer used and any stress flags to induce the deadlock?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19446#issuecomment-2143131720


More information about the shenandoah-dev mailing list