RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v2]

Wed Mar 26 08:39:09 UTC 2025

On Fri, 21 Mar 2025 22:34:43 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> I think adapting and re-using `builtin_throw` like you described is reasonable but I let @iwanowww confirm :slightly_smiling_face:
>
> Yes, that's basically what I had in mind.
> 
> Currently, the focus of the intrinsic is on well-behaved case (overflows are **very** rare). `builtin_throw()` covers more ground and optimize for scenarios when exceptions are thrown. But it depends on `ciMethod::can_omit_stack_trace()` where `-XX:-OmitStackTraceInFastThrow` mode will suffer from the original problem (continuous deoptimizations), plus a round of recompilations before giving up.
> 
> I suggest to improve and reuse `builtin_throw` here and add additional checks in the intrinsic to guard against problematic scenario with continuous deoptimizations. IMO it improves performance model for a wide range of use cases while addressing pathological scenarios.

So, I have done something like that (getting the exception object to throw from parameter, and factor out the logic whether builtin_throw is possible, so we can bailout of intrinsics instead of cycling again). Test seem to pass in the various cases I wrote. As for benchmark, it's quite a change. I post only the new part, the rest is pretty much the same. C2_no_builtin_throw does what the original C2 was (no builtin throw, just bailing out of intrinsics to cut our losses), and new C2 is with builtin_throw. tldr: builtin_throw makes the overflow case of the same order as the in-bound cases (1-4ms) instead of being about 100 times bigger (600-700ms with C1, C2 without intrinsics, C2 with bailing out).

MathExact.C2.loopAddIInBounds                         1000000  avgt    3    1.657 ±   11.994  ms/op
MathExact.C2.loopAddIOverflow                         1000000  avgt    3    1.313 ±    4.188  ms/op
MathExact.C2.loopAddLInBounds                         1000000  avgt    3    0.980 ±    0.396  ms/op
MathExact.C2.loopAddLOverflow                         1000000  avgt    3    2.474 ±    3.473  ms/op
MathExact.C2.loopDecrementIInBounds                   1000000  avgt    3    3.733 ±   13.709  ms/op
MathExact.C2.loopDecrementIOverflow                   1000000  avgt    3    2.792 ±   23.724  ms/op
MathExact.C2.loopDecrementLInBounds                   1000000  avgt    3    2.761 ±   24.744  ms/op
MathExact.C2.loopDecrementLOverflow                   1000000  avgt    3    2.730 ±   23.065  ms/op
MathExact.C2.loopIncrementIInBounds                   1000000  avgt    3    3.134 ±   20.980  ms/op
MathExact.C2.loopIncrementIOverflow                   1000000  avgt    3    3.271 ±    8.876  ms/op
MathExact.C2.loopIncrementLInBounds                   1000000  avgt    3    2.756 ±   22.912  ms/op
MathExact.C2.loopIncrementLOverflow                   1000000  avgt    3    4.549 ±    9.543  ms/op
MathExact.C2.loopMultiplyIInBounds                    1000000  avgt    3    1.268 ±    0.574  ms/op
MathExact.C2.loopMultiplyIOverflow                    1000000  avgt    3    1.572 ±   11.171  ms/op
MathExact.C2.loopMultiplyLInBounds                    1000000  avgt    3    1.021 ±    1.054  ms/op
MathExact.C2.loopMultiplyLOverflow                    1000000  avgt    3    3.167 ±   20.666  ms/op
MathExact.C2.loopNegateIInBounds                      1000000  avgt    3    3.575 ±   29.997  ms/op
MathExact.C2.loopNegateIOverflow                      1000000  avgt    3    4.222 ±    9.041  ms/op
MathExact.C2.loopNegateLInBounds                      1000000  avgt    3    4.452 ±    6.680  ms/op
MathExact.C2.loopNegateLOverflow                      1000000  avgt    3    4.739 ±   34.662  ms/op
MathExact.C2.loopSubtractIInBounds                    1000000  avgt    3    1.087 ±    0.539  ms/op
MathExact.C2.loopSubtractIOverflow                    1000000  avgt    3    3.027 ±    9.709  ms/op
MathExact.C2.loopSubtractLInBounds                    1000000  avgt    3    1.197 ±    5.763  ms/op
MathExact.C2.loopSubtractLOverflow                    1000000  avgt    3    1.765 ±   10.037  ms/op
MathExact.C2_no_builtin_throw.loopAddIInBounds        1000000  avgt    3    2.310 ±    2.990  ms/op
MathExact.C2_no_builtin_throw.loopAddIOverflow        1000000  avgt    3  594.036 ±  500.000  ms/op
MathExact.C2_no_builtin_throw.loopAddLInBounds        1000000  avgt    3    1.577 ±   14.053  ms/op
MathExact.C2_no_builtin_throw.loopAddLOverflow        1000000  avgt    3  631.345 ±   75.836  ms/op
MathExact.C2_no_builtin_throw.loopDecrementIInBounds  1000000  avgt    3    2.090 ±    0.937  ms/op
MathExact.C2_no_builtin_throw.loopDecrementIOverflow  1000000  avgt    3  618.080 ±   38.047  ms/op
MathExact.C2_no_builtin_throw.loopDecrementLInBounds  1000000  avgt    3    4.164 ±    6.184  ms/op
MathExact.C2_no_builtin_throw.loopDecrementLOverflow  1000000  avgt    3  596.031 ±  584.159  ms/op
MathExact.C2_no_builtin_throw.loopIncrementIInBounds  1000000  avgt    3    2.383 ±   11.729  ms/op
MathExact.C2_no_builtin_throw.loopIncrementIOverflow  1000000  avgt    3  626.425 ±  134.612  ms/op
MathExact.C2_no_builtin_throw.loopIncrementLInBounds  1000000  avgt    3    2.345 ±   13.927  ms/op
MathExact.C2_no_builtin_throw.loopIncrementLOverflow  1000000  avgt    3  630.535 ±   99.348  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyIInBounds   1000000  avgt    3    1.419 ±    4.289  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyIOverflow   1000000  avgt    3  587.796 ±   52.215  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyLInBounds   1000000  avgt    3    0.934 ±    0.272  ms/op
MathExact.C2_no_builtin_throw.loopMultiplyLOverflow   1000000  avgt    3  589.736 ±  347.848  ms/op
MathExact.C2_no_builtin_throw.loopNegateIInBounds     1000000  avgt    3    2.236 ±    5.749  ms/op
MathExact.C2_no_builtin_throw.loopNegateIOverflow     1000000  avgt    3  618.711 ±  725.158  ms/op
MathExact.C2_no_builtin_throw.loopNegateLInBounds     1000000  avgt    3    2.605 ±   17.373  ms/op
MathExact.C2_no_builtin_throw.loopNegateLOverflow     1000000  avgt    3  627.055 ±  184.767  ms/op
MathExact.C2_no_builtin_throw.loopSubtractIInBounds   1000000  avgt    3    1.006 ±    0.584  ms/op
MathExact.C2_no_builtin_throw.loopSubtractIOverflow   1000000  avgt    3  588.062 ±  403.116  ms/op
MathExact.C2_no_builtin_throw.loopSubtractLInBounds   1000000  avgt    3    0.978 ±    0.193  ms/op
MathExact.C2_no_builtin_throw.loopSubtractLOverflow   1000000  avgt    3  611.004 ±  456.779  ms/op

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2013625437