RFR: 8336911: ZGC: Division by zero in heuristics after JDK-8332717

Axel Boldt-Christmas aboldtch at openjdk.org
Wed Oct 2 15:52:41 UTC 2024


On Wed, 2 Oct 2024 12:00:19 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> When running with ubsan enabled binaries, the following issue is reported,
> e.g. in test
> compiler/uncommontrap/TestDeoptOOM_ZGenerational.jtr
> also in gc/z/TestSmallHeap.jtr
> 
> 
> jdk/src/hotspot/share/gc/z/zDirector.cpp:537:84: runtime error: division by zero
>     #0 0x7f422495bd1f in calculate_young_to_old_worker_ratio src/hotspot/share/gc/z/zDirector.cpp:537
>     #1 0x7f422495bd1f in select_worker_threads src/hotspot/share/gc/z/zDirector.cpp:694
>     #2 0x7f42282a0d97 in select_worker_threads src/hotspot/share/gc/z/zDirector.cpp:689
>     #3 0x7f42282a0d97 in initial_workers src/hotspot/share/gc/z/zDirector.cpp:784
>     #4 0x7f42282a2485 in initial_workers src/hotspot/share/gc/z/zDirector.cpp:795
>     #5 0x7f42282a2485 in start_minor_gc src/hotspot/share/gc/z/zDirector.cpp:797
>     #6 0x7f42282a2485 in start_gc src/hotspot/share/gc/z/zDirector.cpp:826
>     #7 0x7f42282a2485 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>     #8 0x7f422840bdd8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>     #9 0x7f4225ab6979 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>     #10 0x7f4227e1137a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>     #11 0x7f42274619b1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
>     #12 0x7f422c8d36e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 9a146bd267419cb6a8cf08d7c602953a0f2e12c5)
>     #13 0x7f422c1dc58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: f2d1cb1ef49f8c47d43a4053910ba6137673ccce)
> 
> 
> The division by 0 leads to  'infinity'  on most of our platforms. So instead of relying on this behavior, we can add a small check and  set 'infinity'  for divisor == 0.

I do not think `infinity` is the solution here. There are more problems with the heuristics when no young collection has reclaimed any memory. 

I added a comment about this in an earlier PR (JDK-8339648 / #20888) https://github.com/openjdk/jdk/pull/20888#discussion_r1758502503.

I proposed a solution to this specific issue that makes more sense to me, and avoid the NaN issues here. But will have to talk it over. 

Regardless I think we need to do an overhaul of this code to handle the extreme case of no GC having reclaimed any memory.

_Also this must have been an issue before JDK-8332717 as well?_

src/hotspot/share/gc/z/zDirector.cpp line 539:

> 537:   const double current_old_bytes_freed_per_gc_time = double(reclaimed_per_old_gc) / double(old_gc_time);
> 538:   const double old_vs_young_efficiency_ratio = current_young_bytes_freed_per_gc_time == 0 ? std::numeric_limits<double>::infinity()
> 539:                                                                                           : current_old_bytes_freed_per_gc_time / current_young_bytes_freed_per_gc_time;

I think returning infinity here will cause problems with NaN down the line. It is also unclear what this means if both are `0`. To me something like the following makes sense. But I will discus this with my team.
Suggestion:


  if (current_young_bytes_freed_per_gc_time == 0.0) {
    if (current_old_bytes_freed_per_gc_time == 0.0) {
      // Neither young nor old collections have reclaimed any memory.
      // Give them equal priority.
      return 1.0;
    }

    // Only old collections have reclaimed memory.
    // Prioritize old.
    return ZOldGCThreads;
  }

-------------

Changes requested by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21304#pullrequestreview-2343363648
PR Review Comment: https://git.openjdk.org/jdk/pull/21304#discussion_r1784803080


More information about the hotspot-gc-dev mailing list