RFR: 8326012: JFR: Event for time to safepoint [v9]

Wed Feb 21 02:26:55 UTC 2024

On Tue, 20 Feb 2024 05:27:22 GMT, Denghui Dong <ddong at openjdk.org> wrote:

>> There are now some JFR events related to safepoint. When time-to-safepoint (aka ttsp) is too long, these events could not be very helpful since based on them we cannot know which threads cause it and what those threads are doing.
>> 
>> Users can use `-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=100` to see the threads that don't reach safepoint in time but without stack traces. Using `-XX:+ AbortVMOnSafepointTimeout` can capture the stack traces but it crashes the process, hence it's not sensible to enable the flag in production.
>> 
>> ~~This patch adds a new JFR event `EventSafepointTimeout` to record the threads that cause ttsp too long.~~
>> 
>> ~~This event includes two fields:~~
>> 
>> ~~- safepointId: the relevant safepoint id~~
>> ~~- timeExceeded: the amount of time exceeding `SafepointTimeoutDelay` used by the thread to reach safepoint~~
>> 
>> ~~In the current version, this event records the stack of those problematic threads when they finally reach safepoint. Hence, there is a bias, but it's still helpful to deduce the root place.~~
>> 
>> A better implementation is to record a more accurate stack, but this will increase complexity. At the same time, the native stack may also be important for this problem, but it is not currently supported by JFR.
>> 
>> Any input would be greatly appreciated.
>> 
>> Testing: jdk/jdk/jfr
>
> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   delete _entries when disabled

src/hotspot/share/jfr/support/jfrTimeToSafepoint.hpp line 42:

> 40:     JavaThread* thread;
> 41:     JfrTicks end;
> 42:     int iterations;

Maybe we can think about putting them into JfrThreadLocal.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17888#discussion_r1496788017