RFR: 8352422: [ubsan] Out-of-range reported in ciMethod.cpp:917:20: runtime error: 2.68435e+09 is outside the range of representable values of type 'int'

Thu Apr 24 07:31:59 UTC 2025

On Thu, 24 Apr 2025 06:48:06 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/ci/ciMethod.cpp line 919:
>> 
>>> 917:       double count_d = (double)count * prof_factor * method_life / counter_life + 0.5;
>>> 918:       if (count_d >= static_cast<double>(INT_MAX)) {
>>> 919:         count = INT_MAX;
>> 
>> INT_MAX is probably the best choice, but could cause a change in behavior if the compiler previously returned a negative number here on overflow.  It's not clear to me if we want to preserve existing behavior or not.  In various places we use different limits for saturated counters and may use a negative number to represent overflow.  I don't know if this is by design or accident, but it can happen when, for example, the taken() or not_taken() value of a BranchData overflows and gets clamped to (uint)-1.  In Parse::dynamic_branch_prediction() we assign those uint values to int, resulting in a negative number, which will be rejected by counters_are_meaningful().
>> By clamping an overflow here to INT_MAX instead of a UB negative number, we could allow taken == INT_MAX, not_taken == 0 to sneak past, which might be harmless.  It's not clear to me the best way to handle saturated values, but you might find it useful looking at the history in JDK-8306331 and JDK-8306481.
>
> There was still
> 
> count = (count > 0) ? count : 1;
> 
> so actually, values that became negative by overflow are actually pushed up to 1 (which seems to be a legitimate value to me): it was already not possible to distinguish overflows from good results.
> 
> But even if I return a (new) special value for overflows, there is the question of what to do with that from the caller perspective. In my understanding, the result is used for profiling purpose: to decide how often is a branch taken, and thus, whether it's worth [doing something]. If a branch is so much taken it causes an overflow here, I think the reasonable thing to do is to make it succeed all the comparisons to know if it's taken often enough. If they are all of the form `scale > threshold`, then returning `INT_MAX` will guarantee that we satisfy all such tests. Maybe another way would be to indeed return a special value that would be handled as "result was very high: between INT_MAX and infinity" and handle it as such a value would be, but I suspect it's how `INT_MAX` is handled.
> 
> Also, I don't think having an overflow here signifies a programming error: we have a product of multiple `int` and a `float`. It's easy to overflow an int: `46341 * 46341` is already more than `INT_MAX`.
> 
> Indeed, it can change the behavior in some overflow cases, but then I'd argue that the former behavior was basically random, so it's probably not worth trying to preserve.

You're right, I missed that a negative value would have been changed to 1 in the old code.  Returning INT_MAX seems better than 1 here.  It means the inputs did not overflow by themselves (or they would have been negative already), but only overflowed when scaled.  So the true value is probably closer to INT_MAX than infinity or "unknown".

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24824#discussion_r2057734700