RFR: 8349637: Integer.numberOfLeadingZeros outputs incorrectly in certain cases
Jasmine Karthikeyan
jkarthikeyan at openjdk.org
Wed Feb 12 16:18:13 UTC 2025
On Wed, 12 Feb 2025 08:47:01 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi all,
>> This is a fix for a miscompile in the AVX2 implementation of `CountLeadingZerosV` for int types. Currently, the implementation turns ints into floats, in order to calculating the leading zeros based on the exponent part of the float. Unfortunately, floats can only accurately represent integers up to 2^24. After that, multiple integer values can map onto the same floating point value. The issue manifests when an int is converted to a floating point representation that is higher than it, crossing a bit boundary. As an example, `(float)0x01FFFFFF == (float)0x02000000`, but `lzcnt(0x01FFFFFF) == 7` and `lzcnt(0x02000000) == 6`. The values are incorrectly rounded up.
>>
>> This patch fixes the issue by masking the input in the cases where it is larger than 2^24, to set the low bits to 0. Removing these bits prevents the accidental rounding behavior. I've added these cases to`TestNumberOfContinuousZeros`, and removed the set random seed so that it can produce random inputs to test with.
>>
>> Reviews would be appreciated!
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6242:
>
>> 6240: vpxor(xtmp2, xtmp2, xtmp2, vec_enc);
>> 6241: vpsrld(xtmp2, src, 24, vec_enc);
>> 6242: vpandn(src, xtmp2, src, vec_enc);
>
> We should not be modifying the 'src' as it may have later use, also
> there is an output dependency here as both instruction writes to xtmp2, we can remove the zeroing instruction.
Thanks for noticing that, I've fixed both points.
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6243:
>
>> 6241: vpsrld(xtmp2, src, 24, vec_enc);
>> 6242: vpandn(src, xtmp2, src, vec_enc);
>> 6243:
>
> There are two occurrences of vblendvps in this sequence, it would be worth using following version of blend which emulates it using cheaper instruction sequence and shows better performance on E-core (avx2) targets.
>
> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L3553
>
> We could do it separately and not as a part of this bug-fix PR.
I agree that it might be best to look at this as a followup patch. I want to do a followup RFE to explore whether the same floating point trick can be used to vectorize `Long.numberOfLeadingZeros`, and I can make the change there.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23579#discussion_r1952983648
PR Review Comment: https://git.openjdk.org/jdk/pull/23579#discussion_r1952984763
More information about the hotspot-compiler-dev
mailing list