RFR: 8326012: JFR: Event for safepoint timeout
Denghui Dong
ddong at openjdk.org
Fri Feb 16 13:51:04 UTC 2024
There are now some JFR events related to safepoint. When time-to-safepoint (aka ttsp) is too long, these events could not be very helpful since based on them we cannot know which threads cause it and what those threads are doing.
Users can use `-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=100` to see the threads that don't reach safepoint in time but without stack traces. Using `-XX:+ AbortVMOnSafepointTimeout` can capture the stack traces but it crashes the process, hence it's not sensible to enable the flag in production.
This patch adds a new JFR event `EventSafepoint` to record the threads that causes ttsp too long.
This event includes two fields:
- safepointId: the relevant safepoint id
- timeExceeded: the amount of time exceeding `SafepointTimeoutDelay` used by the thread to reach safepoint
In the current version, this event records the stack of those problematic threads when they finally reach safepoint. Hence, there is a bias, but it's still helpful to deduce the root place.
A better implementation is to record a more accurate stack, but this will increase complexity. At the same time, the native stack may also be important for this problem, but it is not currently supported by JFR.
Any input would be greatly appreciated.
-------------
Commit messages:
- update
- 8326012: JFR: Event for safepoint timeout
Changes: https://git.openjdk.org/jdk/pull/17888/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17888&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8326012
Stats: 143 lines in 7 files changed: 136 ins; 0 del; 7 mod
Patch: https://git.openjdk.org/jdk/pull/17888.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/17888/head:pull/17888
PR: https://git.openjdk.org/jdk/pull/17888
More information about the hotspot-jfr-dev
mailing list