RFR: 8332506: SIGFPE In ObjectSynchronizer::is_async_deflation_needed()

Fredrik Bredberg fbredberg at openjdk.org
Wed Dec 18 22:27:36 UTC 2024


On Wed, 18 Dec 2024 21:02:21 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> This PR solves a division by zero problem, that according to the bug report happened in `ObjectSynchronizer::is_async_deflation_needed()`. As it turns out it really happened in the inlined  `monitors_used_above_threshold()` function. The problematic line looked like this:
>> 
>> `size_t monitor_usage = (monitors_used * 100LL) / ceiling;`
>> 
>> Unfortunately the `ceiling` value was increased every time there where too many deflations without any progress. This whould eventually lead to an overflow in the `ceiling` value, and in unlucky circumstances, it would become zero. Thus causing a division by zero crash.
>> 
>> This PR makes sure not to increase the `ceiling` value if `monitor_usage` is below the `MonitorUsedDeflationThreshold`.
>> It also makes sure the ceiling value is never zero, and does not wrap around.
>> 
>> Tested okay in tier1-5 and `test/hotspot/jtreg/runtime/Monitor/MonitorUsedDeflationThresholdTest.java`
>
> src/hotspot/share/runtime/synchronizer.cpp line 1289:
> 
>> 1287: 
>> 1288:       if (new_ceiling < old_ceiling) {
>> 1289:         new_ceiling = SIZE_MAX; // Wrap around, let's clamp new_ceiling.
> 
> Should we clamp new_ceiling to old_ceiling in this case?   With ceiling always being set to list->max() every time and list->max() never decrements, this will be true forever even if the value isn't SIZE_MAX.  I think we shouldn't use list->max() as a ceiling above and keep new_ceiling = old_ceiling in this case.  Then it seems like the thresholds will more gradually increase rather than have a huge increase with a lot of monitors that are later deflated.

@coleenp 
If we start with your "I have question about the math". Well, so do I. I don't understand the math on line 1286 that calculates `new_ceiling`. I would expect that if `monitor_usage` is above the `MonitorUsedDeflationThreshold` the `new_ceiling` value should be set high enough so that `monitor_usage` will become lower than `MonitorUsedDeflationThreshold` next time. But looking at the trace output from `MonitorUsedDeflationThresholdTest.java` this is not the case. Instead it increases more slowly until it reaches a value where the `monitor_usage` is below the threshold. It's hard to call that a bug, but it doesn't seem like the optimal way of doing it.

What value to use for clamping `new_ceiling` is probably only an academic question, since I now only increase the ceiling value when the `monitor_usage` really is above the threshold. This basically means that we will not see any wrap around. I only added the wrap around code (clamp code) to show that ceiling can never be zero and thus the bug I'm fixing is gone.

Discussed the math stuff with @fisk in the office today. We concluded that this PR fixes the divide by zero problem which has been in the code since 2021, and it would be nice to have that fixed in the 24 release. Then we can create a new feature request to fix more optimal math stuff in time for the 25 release.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22815#discussion_r1890907858


More information about the hotspot-runtime-dev mailing list