RFR: 8348402: PerfDataManager stalls shutdown for 1ms

Wed Jan 29 05:42:47 UTC 2025

On Fri, 24 Jan 2025 09:27:07 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> Found this when studying Leyden performance. [JDK-8049304](https://bugs.openjdk.org/browse/JDK-8049304) added 1ms sleep on destruction path to catch up with threads updating the counters. 
> 
> I was not able to confidently prove the deletion race is benign. Even though `sleep` is not really a fix for that race either, I think it is safer to use GlobalCounter to coordinate deletions.
> 
> This improves startup (roundtrip) tests for the expected 1ms:
> 
> 
> $ hyperfine -w 10 -r 100 ...
> 
> # Before
> Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
>   Time (mean ± σ):      20.0 ms ±   0.3 ms    [User: 11.7 ms, System: 16.0 ms]
>   Range (min … max):    19.0 ms …  21.7 ms    1000 runs
> 
> # After
> Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
>   Time (mean ± σ):      19.0 ms ±   0.3 ms    [User: 11.9 ms, System: 15.8 ms]
>   Range (min … max):    18.2 ms …  20.8 ms    1000 runs
> 
> 
> Additional testing:
>  - [x] Linux AArch64 server fastdebug, `all`
>  - [x] Linux x86_64 server fastdebug, `all`

Thanks for fixing this @shipilev , it looks good to me from a sync perspective. I'm sure if `GlobalCounter` had existed in JDK 9 then it would have been looked at to fix this race. My main concerns here are the efficiency of it - the `release_store_fence` in the middle of low-level OM code does concern me a little (it tends to follow an existing fence however so I expect the impact is not great). Also the iteration through all threads in `GlobalCounter::write_synchronize` does make me wonder how long it actually takes to execute the exit path now, compared to the 1ms sleep? Pathological case would be on a uniprocessor of course, but we're not really concerned about that.

There are two comment blocks in objectMonitor.cpp that should be updated now - here is one:

    // Keep a tally of the # of futile wakeups.
    // Note that the counter is not protected by a lock or updated by atomics.
    // That is by design - we trade "lossy" counters which are exposed to
    // races during updates for a lower probe effect.
    // This PerfData object can be used in parallel with a safepoint.
    // See the work around in PerfDataManager::destroy().
    OM_PERFDATA_OP(FutileWakeups, inc());

src/hotspot/share/runtime/synchronizer.cpp line 66:

> 64: #include "utilities/dtrace.hpp"
> 65: #include "utilities/events.hpp"
> 66: #include "utilities/globalCounter.inline.hpp"

Why is this necessary?

-------------

Changes requested by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/23293#pullrequestreview-2580002103
PR Review Comment: https://git.openjdk.org/jdk/pull/23293#discussion_r1933270104