RFR: 8345493: JFR: JVM.flush hangs intermittently

Markus Grönlund mgronlun at openjdk.org
Tue Jan 14 14:40:11 UTC 2025


Greetings,

This is a hypothetical fix for JDK-8345493, because the issue seems impossible to reproduce, even with instrumentation and extra debug information.

Debugging .mdmp state indicates that a message request thread is not woken up from waiting on a condition variable, even as the sent-in message has been processed. Both the message request thread and the consumer wait on the condition variable instead. This means the message request thread does not wake up to check that its message has been processed.

There is a bit of designed asymmetry in that only a single message thread should be waiting for a message to be processed. The consumer, therefore, signals it using notify(). 

Let's say we have a broken invariant somewhere (not yet found) that allows two threads to post messages—notify() will only wake up a single thread from the associated condition variable.

A safer, intermediate "fix" is to let the consumer issue a notify_all() to wake all potential waiters.

We will continue to investigate the underlying cause but suggest this as an intermediate fix.

Testing: jdk_jfr, stress testing.

Thanks
Markus

-------------

Commit messages:
 - 8345493

Changes: https://git.openjdk.org/jdk/pull/23105/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23105&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8345493
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/23105.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23105/head:pull/23105

PR: https://git.openjdk.org/jdk/pull/23105


More information about the hotspot-jfr-dev mailing list