RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code

Mon Mar 10 10:23:01 UTC 2025

On Fri, 7 Mar 2025 18:03:14 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments.
>> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached.
>> 
>> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all.
>> 
>> tl;dr:
>> - C1: no problem, no change
>> - C2:
>>   - with intrinsics:
>>     - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms)
>>     - without overflow: no problem, no change
>>   - without intrinsics: no problem, no change
>> 
>> Before the fix:
>> 
>> Benchmark                                           (SIZE)  Mode  Cnt     Score      Error  Units
>> MathExact.C1_1.loopAddIInBounds                    1000000  avgt    3     1.272 ±    0.048  ms/op
>> MathExact.C1_1.loopAddIOverflow                    1000000  avgt    3   641.917 ±   58.238  ms/op
>> MathExact.C1_1.loopAddLInBounds                    1000000  avgt    3     1.402 ±    0.842  ms/op
>> MathExact.C1_1.loopAddLOverflow                    1000000  avgt    3   671.013 ±  229.425  ms/op
>> MathExact.C1_1.loopDecrementIInBounds              1000000  avgt    3     3.722 ±   22.244  ms/op
>> MathExact.C1_1.loopDecrementIOverflow              1000000  avgt    3   653.341 ±  279.003  ms/op
>> MathExact.C1_1.loopDecrementLInBounds              1000000  avgt    3     2.525 ±    0.810  ms/op
>> MathExact.C1_1.loopDecrementLOverflow              1000000  avgt    3   656.750 ±  141.792  ms/op
>> MathExact.C1_1.loopIncrementIInBounds              1000000  avgt    3     4.621 ±   12.822  ms/op
>> MathExact.C1_1.loopIncrementIOverflow              1000000  avgt    3   651.608 ±  274.396  ms/op
>> MathExact.C1_1.loopIncrementLInBounds              1000000  avgt    3     2.576 ±    3.316  ms/op
>> MathExact.C1_1.loopIncrementLOverflow              1000000  avgt    3   662.216 ±   71.879  ms/op
>> MathExact.C1_1.loopMultiplyIInBounds               1000000  avgt    3     1.402 ±    0.587  ms/op
>> MathExact.C1_1.loopMultiplyIOverflow               1000000  avgt    3   615.836 ±  252.137  ms/op
>> MathExact.C1_1.loopMultiplyLInBounds               1000000  avgt    3     2.906 ±    5.718  ms/op
>> MathExact.C1_1.loopMultiplyLOverflow               1000000  avgt    3   655.576 ±  147.432  ms/op
>> MathExact.C1_1.loopNegateIInBounds                 1000000  avgt    3     2.023 ±    0.027  ms/op
>> MathExact.C1_1.loopNegateIOverflow                 1000000  avgt    3   639.136 ±   30.841  ms/op
>> MathExact.C1_1.loop...
>
> src/hotspot/share/opto/library_call.cpp line 1963:
> 
>> 1961:     set_i_o(i_o());
>> 1962: 
>> 1963:     uncommon_trap(Deoptimization::Reason_intrinsic,
> 
> What about using `builtin_throw` here? (Requires some tuning on `builtin_throw` side.) How much does it affect performance? Also, passing `must_throw = true` into `uncommon_trap` may help a bit here as well.

Using `builtin_throw` sounds nice! But indeed, it won't work so directly. I want to prevent intrinsic in case of `too_many_traps`. But that's only when `builtin_throw` will do something. But if I only rely on `builtin_throw`, then, when the built-in throwing is not possible (that is when `treat_throw_as_hot && method()->can_omit_stack_trace()` is false), we will have the repeated deopt again.

There is also throwing the right exception, which is right now determined only by the reason (which adapts poorly to this case).

I guess that's what you meant by tuning: be able to know if we would built-in throw, and if so, do it, otherwise, prevent infinitely repeated deopt.

The way I see doing that is by (maybe optionally) providing the preallocated exception to throw as a parameter so that we don't have to rely on the "reason to exception" decision (or we can override it), and factor out the decision whether we can take the nice branch of `builtin_throw` so that we can bail out of intrinsic if we can't fast throw before we start setting up the intrinsic (that we would then need to undo). Does that match what you had in mind or you have another suggestion?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r1986999005