RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code
Marc Chevalier
duke at openjdk.org
Mon Mar 10 10:23:01 UTC 2025
On Fri, 7 Mar 2025 18:03:14 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:
>> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments.
>> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached.
>>
>> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all.
>>
>> tl;dr:
>> - C1: no problem, no change
>> - C2:
>> - with intrinsics:
>> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms)
>> - without overflow: no problem, no change
>> - without intrinsics: no problem, no change
>>
>> Before the fix:
>>
>> Benchmark (SIZE) Mode Cnt Score Error Units
>> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ± 0.048 ms/op
>> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ± 58.238 ms/op
>> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ± 0.842 ms/op
>> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ± 229.425 ms/op
>> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ± 22.244 ms/op
>> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ± 279.003 ms/op
>> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ± 0.810 ms/op
>> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ± 141.792 ms/op
>> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ± 12.822 ms/op
>> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ± 274.396 ms/op
>> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ± 3.316 ms/op
>> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ± 71.879 ms/op
>> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ± 0.587 ms/op
>> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ± 252.137 ms/op
>> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ± 5.718 ms/op
>> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ± 147.432 ms/op
>> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ± 0.027 ms/op
>> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ± 30.841 ms/op
>> MathExact.C1_1.loop...
>
> src/hotspot/share/opto/library_call.cpp line 1963:
>
>> 1961: set_i_o(i_o());
>> 1962:
>> 1963: uncommon_trap(Deoptimization::Reason_intrinsic,
>
> What about using `builtin_throw` here? (Requires some tuning on `builtin_throw` side.) How much does it affect performance? Also, passing `must_throw = true` into `uncommon_trap` may help a bit here as well.
Using `builtin_throw` sounds nice! But indeed, it won't work so directly. I want to prevent intrinsic in case of `too_many_traps`. But that's only when `builtin_throw` will do something. But if I only rely on `builtin_throw`, then, when the built-in throwing is not possible (that is when `treat_throw_as_hot && method()->can_omit_stack_trace()` is false), we will have the repeated deopt again.
There is also throwing the right exception, which is right now determined only by the reason (which adapts poorly to this case).
I guess that's what you meant by tuning: be able to know if we would built-in throw, and if so, do it, otherwise, prevent infinitely repeated deopt.
The way I see doing that is by (maybe optionally) providing the preallocated exception to throw as a parameter so that we don't have to rely on the "reason to exception" decision (or we can override it), and factor out the decision whether we can take the nice branch of `builtin_throw` so that we can bail out of intrinsic if we can't fast throw before we start setting up the intrinsic (that we would then need to undo). Does that match what you had in mind or you have another suggestion?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r1986999005
More information about the hotspot-compiler-dev
mailing list