RFR: 8345493: JFR: JVM.flush hangs intermittently
Markus Grönlund
mgronlun at openjdk.org
Tue Jan 14 14:40:11 UTC 2025
Greetings,
This is a hypothetical fix for JDK-8345493, because the issue seems impossible to reproduce, even with instrumentation and extra debug information.
Debugging .mdmp state indicates that a message request thread is not woken up from waiting on a condition variable, even as the sent-in message has been processed. Both the message request thread and the consumer wait on the condition variable instead. This means the message request thread does not wake up to check that its message has been processed.
There is a bit of designed asymmetry in that only a single message thread should be waiting for a message to be processed. The consumer, therefore, signals it using notify().
Let's say we have a broken invariant somewhere (not yet found) that allows two threads to post messages—notify() will only wake up a single thread from the associated condition variable.
A safer, intermediate "fix" is to let the consumer issue a notify_all() to wake all potential waiters.
We will continue to investigate the underlying cause but suggest this as an intermediate fix.
Testing: jdk_jfr, stress testing.
Thanks
Markus
-------------
Commit messages:
- 8345493
Changes: https://git.openjdk.org/jdk/pull/23105/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23105&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8345493
Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
Patch: https://git.openjdk.org/jdk/pull/23105.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/23105/head:pull/23105
PR: https://git.openjdk.org/jdk/pull/23105
More information about the hotspot-jfr-dev
mailing list