RFR: 8349637: Integer.numberOfLeadingZeros outputs incorrectly in certain cases

Wed Feb 12 09:11:15 UTC 2025

On Wed, 12 Feb 2025 05:47:52 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> Hi all,
> This is a fix for a miscompile in the AVX2 implementation of `CountLeadingZerosV` for int types. Currently, the implementation turns ints into floats, in order to calculating the leading zeros based on the exponent part of the float. Unfortunately, floats can only accurately represent integers up to 2^24. After that, multiple integer values can map onto the same floating point value. The issue manifests when an int is converted to a floating point representation that is higher than it, crossing a bit boundary. As an example, `(float)0x01FFFFFF == (float)0x02000000`, but `lzcnt(0x01FFFFFF) == 7` and `lzcnt(0x02000000) == 6`. The values are incorrectly rounded up.
> 
> This patch fixes the issue by masking the input in the cases where it is larger than 2^24, to set the low bits to 0. Removing these bits prevents the accidental rounding behavior. I've added these cases to`TestNumberOfContinuousZeros`, and removed the set random seed so that it can produce random inputs to test with.
> 
> Reviews would be appreciated!

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6238:

> 6236: 
> 6237:   // As 2^24 is the largest possible integer that can be exactly represented by a float value, special handling has to be
> 6238:   // done to avoid losing precision by potentially rounding up. To avoid that, we construct a mask to remove low set bits

IEEE single precision floating point format has a fixed precision of 24 bits, to accommodate a higher precision number exponent is incremented accordingly, even though this leads to a precision loss, but grants a larger dynamic range to a floating point number in comparison to a fixed point integral format.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6241:

> 6239:   // when the number has the upper 8 bits set. This is a valid transformation as it only removes low bits, and keeps the high bits intact.
> 6240:   vpxor(xtmp2, xtmp2, xtmp2, vec_enc);
> 6241:   vpsrld(xtmp2, src, 24, vec_enc);

There is an output dependency here since both instructions have same destination operand. I agree with @merykitty suggestion to chop off lower 8 bits to neglect any side-effects in exponent on account of precision e.g.

jshell> (Float.floatToRawIntBits((float)(0x01FFFFFE)) >>> 23) - 127
$68 ==> 24

jshell> (Float.floatToRawIntBits((float)(0x01FFFFFF)) >>> 23) - 127
$69 ==> 25

``` 

In both the above case we should return 24.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6242:

> 6240:   vpxor(xtmp2, xtmp2, xtmp2, vec_enc);
> 6241:   vpsrld(xtmp2, src, 24, vec_enc);
> 6242:   vpandn(src, xtmp2, src, vec_enc);

We should not be modifying the 'src' as it may have later use.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6243:

> 6241:   vpsrld(xtmp2, src, 24, vec_enc);
> 6242:   vpandn(src, xtmp2, src, vec_enc);
> 6243: 

There are two occurrences of vblendvps in this sequence, it would be worth using following version of blend which emulates it using cheaper instruction sequence and shows better performance on E-core (avx2) targets.

https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L3553

We could do it separately and not as a part of this bug-fix PR.

test/hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java line 45:

> 43: 
> 44: public class TestNumberOfContinuousZeros {
> 45:     private static final int[] SPECIAL = { 0x01FFFFFF, 0x03FFFFFE, 0x07FFFFFC, 0x0FFFFFF8, 0x1FFFFFF0, 0x3FFFFFE0 };

Can you also check for 0xFFFFFFFF, even though we have special handling for -ve signed numbers not affected by this patch.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23579#discussion_r1952222539
PR Review Comment: https://git.openjdk.org/jdk/pull/23579#discussion_r1952239292
PR Review Comment: https://git.openjdk.org/jdk/pull/23579#discussion_r1952217092
PR Review Comment: https://git.openjdk.org/jdk/pull/23579#discussion_r1952232091
PR Review Comment: https://git.openjdk.org/jdk/pull/23579#discussion_r1952249864