RFR: 8345493: JFR: JVM.flush hangs intermittently
Erik Gahlin
egahlin at openjdk.org
Tue Jan 14 16:12:56 UTC 2025
On Tue, 14 Jan 2025 14:35:07 GMT, Markus Grönlund <mgronlun at openjdk.org> wrote:
> Greetings,
>
> This is a hypothetical fix for JDK-8345493, because the issue seems impossible to reproduce, even with instrumentation and extra debug information.
>
> Debugging .mdmp state indicates that a message request thread is not woken up from waiting on a condition variable, even as the sent-in message has been processed. Both the message request thread and the consumer wait on the condition variable instead. This means the message request thread does not wake up to check that its message has been processed.
>
> There is a bit of designed asymmetry in that only a single message thread should be waiting for a message to be processed. The consumer, therefore, signals it using notify().
>
> Let's say we have a broken invariant somewhere (not yet found) that allows two threads to post messages—notify() will only wake up a single thread from the associated condition variable.
>
> A safer, intermediate "fix" is to let the consumer issue a notify_all() to wake all potential waiters.
>
> We will continue to investigate the underlying cause but suggest this as an intermediate fix.
>
> Testing: jdk_jfr, stress testing.
>
> Thanks
> Markus
Marked as reviewed by egahlin (Reviewer).
-------------
PR Review: https://git.openjdk.org/jdk/pull/23105#pullrequestreview-2550239729
More information about the hotspot-jfr-dev
mailing list