RFR: 8345219: C2: Avoid bailing to interpreter stubs for signalling NaNs on x86_64

Aleksey Shipilev shade at openjdk.org
Thu Nov 28 18:46:14 UTC 2024


On Thu, 28 Nov 2024 18:22:24 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> Found this while cleaning up x86_32 code for removal.
> 
> In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373):
> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473
> 
> Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991).
> 
> But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32:
> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502
> 
> This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments.
> 
> Additional testing:
>  - [ ] Linux x86_64 server fastdebug, `all`

As expected, none of this matters when C2 intrinsics work:


Benchmark                                     Mode  Cnt  Score    Error  Units

# Baseline

DoubleBitConversion.doubleToLongBits_NaN      avgt    9  0.542 ±  0.001  ns/op
DoubleBitConversion.doubleToLongBits_one      avgt    9  0.542 ±  0.001  ns/op
DoubleBitConversion.doubleToLongBits_zero     avgt    9  0.542 ±  0.001  ns/op
DoubleBitConversion.doubleToRawLongBits_NaN   avgt    9  0.420 ±  0.041  ns/op
DoubleBitConversion.doubleToRawLongBits_one   avgt    9  0.413 ±  0.012  ns/op
DoubleBitConversion.doubleToRawLongBits_zero  avgt    9  0.412 ±  0.020  ns/op
DoubleBitConversion.longBitsToDouble_NaN      avgt    9  0.413 ±  0.007  ns/op
DoubleBitConversion.longBitsToDouble_one      avgt    9  0.409 ±  0.007  ns/op
DoubleBitConversion.longBitsToDouble_zero     avgt    9  0.414 ±  0.012  ns/op

FloatBitConversion.floatToIntBits_NaN         avgt    9  0.542 ±  0.001  ns/op
FloatBitConversion.floatToIntBits_one         avgt    9  0.542 ±  0.001  ns/op
FloatBitConversion.floatToIntBits_zero        avgt    9  0.542 ±  0.001  ns/op
FloatBitConversion.floatToRawIntBits_NaN      avgt    9  0.410 ±  0.005  ns/op
FloatBitConversion.floatToRawIntBits_one      avgt    9  0.412 ±  0.008  ns/op
FloatBitConversion.floatToRawIntBits_zero     avgt    9  0.413 ±  0.004  ns/op
FloatBitConversion.intBitsToFloat_NaN         avgt    9  0.412 ±  0.008  ns/op
FloatBitConversion.intBitsToFloat_one         avgt    9  0.413 ±  0.009  ns/op
FloatBitConversion.intBitsToFloat_zero        avgt    9  0.421 ±  0.022  ns/op

# Patched

DoubleBitConversion.doubleToLongBits_NaN      avgt    9  0.542 ±  0.001  ns/op
DoubleBitConversion.doubleToLongBits_one      avgt    9  0.542 ±  0.001  ns/op
DoubleBitConversion.doubleToLongBits_zero     avgt    9  0.542 ±  0.001  ns/op
DoubleBitConversion.doubleToRawLongBits_NaN   avgt    9  0.425 ±  0.036  ns/op
DoubleBitConversion.doubleToRawLongBits_one   avgt    9  0.418 ±  0.009  ns/op
DoubleBitConversion.doubleToRawLongBits_zero  avgt    9  0.416 ±  0.017  ns/op
DoubleBitConversion.longBitsToDouble_NaN      avgt    9  0.412 ±  0.004  ns/op
DoubleBitConversion.longBitsToDouble_one      avgt    9  0.412 ±  0.010  ns/op
DoubleBitConversion.longBitsToDouble_zero     avgt    9  0.414 ±  0.005  ns/op

FloatBitConversion.floatToIntBits_NaN         avgt    9  0.542 ±  0.001  ns/op
FloatBitConversion.floatToIntBits_one         avgt    9  0.542 ±  0.001  ns/op
FloatBitConversion.floatToIntBits_zero        avgt    9  0.542 ±  0.001  ns/op
FloatBitConversion.floatToRawIntBits_NaN      avgt    9  0.410 ±  0.005  ns/op
FloatBitConversion.floatToRawIntBits_one      avgt    9  0.408 ±  0.007  ns/op
FloatBitConversion.floatToRawIntBits_zero     avgt    9  0.413 ±  0.015  ns/op
FloatBitConversion.intBitsToFloat_NaN         avgt    9  0.411 ±  0.008  ns/op
FloatBitConversion.intBitsToFloat_one         avgt    9  0.409 ±  0.008  ns/op
FloatBitConversion.intBitsToFloat_zero        avgt    9  0.426 ±  0.011  ns/op



It does matter a lot when the choice is to go through interpreter native entry (slow) or via compiled native adapter (fast):


# Baseline, -XX:-InlineMathNatives

DoubleBitConversion.doubleToLongBits_NaN      avgt    9   0.604 ±  0.015  ns/op
DoubleBitConversion.doubleToLongBits_one      avgt    9  97.382 ±  1.364  ns/op
DoubleBitConversion.doubleToLongBits_zero     avgt    9  97.636 ±  2.620  ns/op
DoubleBitConversion.doubleToRawLongBits_NaN   avgt    9  96.162 ±  0.513  ns/op
DoubleBitConversion.doubleToRawLongBits_one   avgt    9  98.678 ±  3.378  ns/op
DoubleBitConversion.doubleToRawLongBits_zero  avgt    9  97.374 ±  3.878  ns/op
DoubleBitConversion.longBitsToDouble_NaN      avgt    9  96.753 ±  3.659  ns/op
DoubleBitConversion.longBitsToDouble_one      avgt    9  97.173 ±  2.879  ns/op
DoubleBitConversion.longBitsToDouble_zero     avgt    9  96.375 ±  2.150  ns/op

FloatBitConversion.floatToIntBits_NaN         avgt    9   0.542 ±  0.001  ns/op
FloatBitConversion.floatToIntBits_one         avgt    9  95.868 ±  2.192  ns/op
FloatBitConversion.floatToIntBits_zero        avgt    9  97.377 ±  2.346  ns/op
FloatBitConversion.floatToRawIntBits_NaN      avgt    9  95.947 ±  2.211  ns/op
FloatBitConversion.floatToRawIntBits_one      avgt    9  97.705 ±  3.467  ns/op
FloatBitConversion.floatToRawIntBits_zero     avgt    9  96.052 ±  2.359  ns/op
FloatBitConversion.intBitsToFloat_NaN         avgt    9  98.793 ±  1.997  ns/op
FloatBitConversion.intBitsToFloat_one         avgt    9  97.201 ±  2.327  ns/op
FloatBitConversion.intBitsToFloat_zero        avgt    9  97.515 ±  1.939  ns/op

# Patched, -XX:-InlineMathNatives

DoubleBitConversion.doubleToLongBits_NaN      avgt    9  0.598 ±  0.025  ns/op
DoubleBitConversion.doubleToLongBits_one      avgt    9  4.508 ±  0.318  ns/op
DoubleBitConversion.doubleToLongBits_zero     avgt    9  4.370 ±  0.003  ns/op
DoubleBitConversion.doubleToRawLongBits_NaN   avgt    9  4.285 ±  0.295  ns/op
DoubleBitConversion.doubleToRawLongBits_one   avgt    9  4.281 ±  0.331  ns/op
DoubleBitConversion.doubleToRawLongBits_zero  avgt    9  4.155 ±  0.311  ns/op
DoubleBitConversion.longBitsToDouble_NaN      avgt    9  4.592 ±  0.362  ns/op
DoubleBitConversion.longBitsToDouble_one      avgt    9  4.815 ±  0.038  ns/op
DoubleBitConversion.longBitsToDouble_zero     avgt    9  4.800 ±  0.019  ns/op

FloatBitConversion.floatToIntBits_NaN         avgt    9  0.542 ±  0.001  ns/op
FloatBitConversion.floatToIntBits_one         avgt    9  4.510 ±  0.322  ns/op
FloatBitConversion.floatToIntBits_zero        avgt    9  4.501 ±  0.332  ns/op
FloatBitConversion.floatToRawIntBits_NaN      avgt    9  4.280 ±  0.336  ns/op
FloatBitConversion.floatToRawIntBits_one      avgt    9  4.278 ±  0.320  ns/op
FloatBitConversion.floatToRawIntBits_zero     avgt    9  4.144 ±  0.329  ns/op
FloatBitConversion.intBitsToFloat_NaN         avgt    9  4.551 ±  0.329  ns/op
FloatBitConversion.intBitsToFloat_one         avgt    9  4.549 ±  0.327  ns/op
FloatBitConversion.intBitsToFloat_zero        avgt    9  4.676 ±  0.328  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2506638455


More information about the hotspot-compiler-dev mailing list