RFR: 8348402: PerfDataManager stalls shutdown for 1ms
David Holmes
dholmes at openjdk.org
Wed Jan 29 05:42:47 UTC 2025
On Fri, 24 Jan 2025 09:27:07 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> Found this when studying Leyden performance. [JDK-8049304](https://bugs.openjdk.org/browse/JDK-8049304) added 1ms sleep on destruction path to catch up with threads updating the counters.
>
> I was not able to confidently prove the deletion race is benign. Even though `sleep` is not really a fix for that race either, I think it is safer to use GlobalCounter to coordinate deletions.
>
> This improves startup (roundtrip) tests for the expected 1ms:
>
>
> $ hyperfine -w 10 -r 100 ...
>
> # Before
> Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
> Time (mean ± σ): 20.0 ms ± 0.3 ms [User: 11.7 ms, System: 16.0 ms]
> Range (min … max): 19.0 ms … 21.7 ms 1000 runs
>
> # After
> Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
> Time (mean ± σ): 19.0 ms ± 0.3 ms [User: 11.9 ms, System: 15.8 ms]
> Range (min … max): 18.2 ms … 20.8 ms 1000 runs
>
>
> Additional testing:
> - [x] Linux AArch64 server fastdebug, `all`
> - [x] Linux x86_64 server fastdebug, `all`
Thanks for fixing this @shipilev , it looks good to me from a sync perspective. I'm sure if `GlobalCounter` had existed in JDK 9 then it would have been looked at to fix this race. My main concerns here are the efficiency of it - the `release_store_fence` in the middle of low-level OM code does concern me a little (it tends to follow an existing fence however so I expect the impact is not great). Also the iteration through all threads in `GlobalCounter::write_synchronize` does make me wonder how long it actually takes to execute the exit path now, compared to the 1ms sleep? Pathological case would be on a uniprocessor of course, but we're not really concerned about that.
There are two comment blocks in objectMonitor.cpp that should be updated now - here is one:
// Keep a tally of the # of futile wakeups.
// Note that the counter is not protected by a lock or updated by atomics.
// That is by design - we trade "lossy" counters which are exposed to
// races during updates for a lower probe effect.
// This PerfData object can be used in parallel with a safepoint.
// See the work around in PerfDataManager::destroy().
OM_PERFDATA_OP(FutileWakeups, inc());
src/hotspot/share/runtime/synchronizer.cpp line 66:
> 64: #include "utilities/dtrace.hpp"
> 65: #include "utilities/events.hpp"
> 66: #include "utilities/globalCounter.inline.hpp"
Why is this necessary?
-------------
Changes requested by dholmes (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/23293#pullrequestreview-2580002103
PR Review Comment: https://git.openjdk.org/jdk/pull/23293#discussion_r1933270104
More information about the hotspot-runtime-dev
mailing list