RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate
Axel Boldt-Christmas
aboldtch at openjdk.org
Fri Sep 6 12:51:55 UTC 2024
On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:
> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
> shows this error when running with ubsan enabled
>
> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
> #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
> #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
> #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
> #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
> #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
> #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
> #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858
src/hotspot/share/gc/z/zDirector.cpp line 524:
> 522: const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc);
> 523: old_garbage_is_cheaper = current_old_gc_time_per_bytes_freed < current_young_gc_time_per_bytes_freed;
> 524: }
Ending up with `old_garbage_is_cheaper == true` when `reclaimed_per_old_gc == 0` seems wrong to me.
Division by 0.0 is weird in C++. Do we even build for systems where it would not be supported. But regardless to me I feel like the change here should be more like:
- const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc);
+ const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits<double>::infinity : double(old_gc_time) / double(reclaimed_per_old_gc);
Which is the behaviour I expect us to currently have, given that `old_gc_time` should be a positive number (`>0.0`). The `!stats._old_stats._cycle._is_time_trustable` check above should protect against `0.0`.
I expect that this division we see happens when we have run a warmup major collection which did no reclaim any memory. And this change would trigger us to try and promote a minor collection to a major collection.
I am no expert on our supported platforms matrix w.r.t. floating numbers and `std::numeric_limits<T>::has_infinity`.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1747065752
More information about the hotspot-gc-dev
mailing list