RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v2]
Vladimir Ivanov
vlivanov at openjdk.org
Fri Mar 28 21:49:25 UTC 2025
On Wed, 26 Mar 2025 08:33:58 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:
>> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments.
>> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached.
>>
>> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all.
>>
>> tl;dr:
>> - C1: no problem, no change
>> - C2:
>> - with intrinsics:
>> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms)
>> - without overflow: no problem, no change
>> - without intrinsics: no problem, no change
>>
>> Before the fix:
>>
>> Benchmark (SIZE) Mode Cnt Score Error Units
>> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ± 0.048 ms/op
>> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ± 58.238 ms/op
>> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ± 0.842 ms/op
>> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ± 229.425 ms/op
>> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ± 22.244 ms/op
>> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ± 279.003 ms/op
>> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ± 0.810 ms/op
>> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ± 141.792 ms/op
>> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ± 12.822 ms/op
>> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ± 274.396 ms/op
>> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ± 3.316 ms/op
>> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ± 71.879 ms/op
>> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ± 0.587 ms/op
>> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ± 252.137 ms/op
>> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ± 5.718 ms/op
>> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ± 147.432 ms/op
>> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ± 0.027 ms/op
>> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ± 30.841 ms/op
>> MathExact.C1_1.loop...
>
> Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>
> - Use builtin_throw
> - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code
> - More exhaustive bench
> - Limit inlining of math Exact operations in case of too many deopts
Thanks, Marc.
It looks a bit too convoluted to me. IMO an unconditional call to `builtin_throw`, plus `too_many_traps` check should do the job. Do I miss something important here?
src/hotspot/share/opto/graphKit.hpp line 279:
> 277: // The JVMS must allow the bytecode to be re-executed via an uncommon trap.
> 278: // If `exception_object` is nullptr, the exception to throw will be guessed based on `reason`
> 279: void builtin_throw(Deoptimization::DeoptReason reason, ciInstance* exception_object = nullptr);
Please, introduce a new overload instead.
I suggest to extract Deoptimization::DeoptReason -> ciInstance mapping into a helper method and turn `void builtin_throw(Deoptimization::DeoptReason reason)` into a wrapper:
void GraphKit::builtin_throw(Deoptimization::DeoptReason reason) {
builtin_throw(reason, exception_on_deopt(reason));
}
src/hotspot/share/opto/library_call.cpp line 2035:
> 2033:
> 2034: if (use_builtin_throw) {
> 2035: builtin_throw(Deoptimization::Reason_intrinsic, env()->ArithmeticException_instance());
I suggest to unconditionally call `builtin_throw()`. It should handle `uncommon_trap` case as well.
What makes sense is to ensure that `builtin_throw()` doesn't change deoptimization reason. It can be implemented with an extra argument to new `GraphKit::builtin_throw` overload (e.g., `bool allow_deopt_reason_none`).
src/hotspot/share/opto/library_call.cpp line 2054:
> 2052: // instead of bailing out on intrinsic or potentially deopting, let's do that!
> 2053: use_builtin_throw = true;
> 2054: } else if (too_many_traps(Deoptimization::Reason_intrinsic)) {
Why `too_many_traps(Deoptimization::Reason_intrinsic)` check is not enough here?
-------------
PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2726864135
PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2019432922
PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2019444895
PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2019449996
More information about the hotspot-compiler-dev
mailing list