RFR: 8351146: JFR: Reconsider thresholds for monitor-related events

Erik Gahlin egahlin at openjdk.org
Tue Mar 4 22:32:56 UTC 2025


On Tue, 4 Mar 2025 11:03:03 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> There are two issues with current thresholds for monitor-related events:
> 
>  1. The threshold for enter/wait is too high for practical use. @coleenp separately said during related CSR review that 5ms is more practical value: https://bugs.openjdk.org/browse/JDK-8348833?focusedId=14744853#comment-14744853 -- I think we should trim those even further.
>  
>  2. There is a threshold for inflate event. But the operation that is covered by that event is very fast, and often lock-free, so the threshold would filter many events. This would be important as we add deflation event ([JDK-8351142](https://bugs.openjdk.org/browse/JDK-8351142)), which should match inflations.
> 
> The PR changes `locking-threshold` default, so it affects a few other events as well. I am picking 1ms as the practical value that we usually deal with in production environments, as we chase down single-digit-ms latency hogs. I am open for suggestions about the values, though. Maybe `default.jfc` values should be higher?
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `jdk_jfr` still passes

We looked into this when we created the initial configurations and are aware that the values are rather high.

The default configuration should be able to handle pathological applications. A user should never have to learn how to disable an event to use JFR, or risk running into more than 1% CPU overhead.

Users can always opt-in, i.e. -XX:StartFlightRecording:locking-threshold=1ms.

The profile configuration is more targeted to what can be expected to work in typical applications.

That said, our plan is to introduce rate-limited sampling, so we can set an upper bound, i.e. 300/s, similar to what we have for the ObjectAllocationSample event. We probably want to have both outliers/thresholds and samples at the same time, so there are design issues that need to be sorted out. For example, should we sample above the threshold or should they be independent. There is also another feature coming, contextual events, that could impact the design. We have similar issues with socket events.

I think we should keep things as-is until we implement rate-limited sampling, to avoid flipping settings back and forth between releases.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23891#issuecomment-2699108603


More information about the hotspot-jfr-dev mailing list