RFR: 8336911: ZGC: Division by zero in heuristics after JDK-8332717
Axel Boldt-Christmas
aboldtch at openjdk.org
Wed Oct 2 15:52:41 UTC 2024
On Wed, 2 Oct 2024 12:00:19 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:
> When running with ubsan enabled binaries, the following issue is reported,
> e.g. in test
> compiler/uncommontrap/TestDeoptOOM_ZGenerational.jtr
> also in gc/z/TestSmallHeap.jtr
>
>
> jdk/src/hotspot/share/gc/z/zDirector.cpp:537:84: runtime error: division by zero
> #0 0x7f422495bd1f in calculate_young_to_old_worker_ratio src/hotspot/share/gc/z/zDirector.cpp:537
> #1 0x7f422495bd1f in select_worker_threads src/hotspot/share/gc/z/zDirector.cpp:694
> #2 0x7f42282a0d97 in select_worker_threads src/hotspot/share/gc/z/zDirector.cpp:689
> #3 0x7f42282a0d97 in initial_workers src/hotspot/share/gc/z/zDirector.cpp:784
> #4 0x7f42282a2485 in initial_workers src/hotspot/share/gc/z/zDirector.cpp:795
> #5 0x7f42282a2485 in start_minor_gc src/hotspot/share/gc/z/zDirector.cpp:797
> #6 0x7f42282a2485 in start_gc src/hotspot/share/gc/z/zDirector.cpp:826
> #7 0x7f42282a2485 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
> #8 0x7f422840bdd8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
> #9 0x7f4225ab6979 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
> #10 0x7f4227e1137a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
> #11 0x7f42274619b1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
> #12 0x7f422c8d36e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 9a146bd267419cb6a8cf08d7c602953a0f2e12c5)
> #13 0x7f422c1dc58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: f2d1cb1ef49f8c47d43a4053910ba6137673ccce)
>
>
> The division by 0 leads to 'infinity' on most of our platforms. So instead of relying on this behavior, we can add a small check and set 'infinity' for divisor == 0.
I do not think `infinity` is the solution here. There are more problems with the heuristics when no young collection has reclaimed any memory.
I added a comment about this in an earlier PR (JDK-8339648 / #20888) https://github.com/openjdk/jdk/pull/20888#discussion_r1758502503.
I proposed a solution to this specific issue that makes more sense to me, and avoid the NaN issues here. But will have to talk it over.
Regardless I think we need to do an overhaul of this code to handle the extreme case of no GC having reclaimed any memory.
_Also this must have been an issue before JDK-8332717 as well?_
src/hotspot/share/gc/z/zDirector.cpp line 539:
> 537: const double current_old_bytes_freed_per_gc_time = double(reclaimed_per_old_gc) / double(old_gc_time);
> 538: const double old_vs_young_efficiency_ratio = current_young_bytes_freed_per_gc_time == 0 ? std::numeric_limits<double>::infinity()
> 539: : current_old_bytes_freed_per_gc_time / current_young_bytes_freed_per_gc_time;
I think returning infinity here will cause problems with NaN down the line. It is also unclear what this means if both are `0`. To me something like the following makes sense. But I will discus this with my team.
Suggestion:
if (current_young_bytes_freed_per_gc_time == 0.0) {
if (current_old_bytes_freed_per_gc_time == 0.0) {
// Neither young nor old collections have reclaimed any memory.
// Give them equal priority.
return 1.0;
}
// Only old collections have reclaimed memory.
// Prioritize old.
return ZOldGCThreads;
}
-------------
Changes requested by aboldtch (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/21304#pullrequestreview-2343363648
PR Review Comment: https://git.openjdk.org/jdk/pull/21304#discussion_r1784803080
More information about the hotspot-gc-dev
mailing list