RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate

Axel Boldt-Christmas aboldtch at openjdk.org
Fri Sep 6 12:51:55 UTC 2024


On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
> shows this error when running with ubsan enabled 
> 
> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
>     #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
>     #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
>     #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
>     #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
>     #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
>     #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
>     #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858

src/hotspot/share/gc/z/zDirector.cpp line 524:

> 522:     const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc);
> 523:     old_garbage_is_cheaper = current_old_gc_time_per_bytes_freed < current_young_gc_time_per_bytes_freed;
> 524:   }

Ending up with `old_garbage_is_cheaper == true` when  `reclaimed_per_old_gc == 0` seems wrong to me. 

Division by 0.0 is weird in C++. Do we even build for systems where it would not be supported. But regardless to me I feel like the change here should be more like:

-  const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc);
+  const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits<double>::infinity : double(old_gc_time) / double(reclaimed_per_old_gc);

Which is the behaviour I expect us to currently have, given that `old_gc_time` should be a positive number (`>0.0`).  The `!stats._old_stats._cycle._is_time_trustable` check above should protect against `0.0`.

I expect that this division we see happens when we have run a warmup major collection which did no reclaim any memory.  And this change would trigger us to try and promote a minor collection to a major collection.

I am no expert on our supported platforms matrix w.r.t. floating numbers and `std::numeric_limits<T>::has_infinity`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1747065752


More information about the hotspot-gc-dev mailing list