RFR: 8298377: JfrVframeStream causes deadlocks in ZGC
Markus Grönlund
mgronlun at openjdk.org
Mon Dec 12 14:08:20 UTC 2022
On Thu, 8 Dec 2022 11:23:57 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:
> The JfrVFrameStream is used while generating stack traces for events. One of the events are the ZPage allocation event. This event is sometimes sent when ZGC is relocating. The current implementation of JfrVFrameStream uses WalkContinuation::include, which causes JFR to walk the continuation and perform GC barriers. This is problematic, since ZGC has a requirement that we never perform load barriers while running the relocation code. If we do, we might end up performing other reloctions from the the relocation code, and in some cases that causes dead locks.
>
> I propose that JFR doesn't walk the continuations when sending events. An alternative could be to limit this to ZGC, but I'd like to get some feedback around that from JFR / Loom devs.
>
> We've been testing this patch in the Generational ZGC repository.
It is a complicated problem to solve; I remember spending a long time resolving the recursivity issues involved with the introduction of Virtual Threads.
Previously in JFR, a few paths were selected as a function of the event type but removed long since because it requires maintaining a list of sensitive event types that all must check. The "skip" value could perhaps be conditionalized behind UseZGC and for the ZGC event types only. It requires passing through the event type and said list of sensitive types.
Or maybe we can introduce an additional attribute in metadata.xml, that way we only need to check the individual event type, not a set of event types.
-------------
PR: https://git.openjdk.org/jdk/pull/11586
More information about the hotspot-dev
mailing list