RFR: 8349637: Integer.numberOfLeadingZeros outputs incorrectly in certain cases [v3]
Raffaello Giulietti
rgiulietti at openjdk.org
Mon Feb 17 08:52:11 UTC 2025
On Mon, 17 Feb 2025 03:41:11 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
>> Here's the pseudo-code for an implementation with 13 vector instructions.
>>
>> Let `fp` denote `float` or `double`.
>> Correspondingly, let `P` = 24, 53 (precision); `L` = 5, 6; `W` = 2^`L` (lane width).
>>
>> The code below is pseudo Java and describes `W`-bit lane operations.
>> Note that each line corresponds to one vector instruction.
>> Further, there's no need for `xtmp3`.
>>
>>
>> // Convert src to floating-point.
>> // First ensure that the bit to the right of the leading 1, if any, is 0.
>> dst = src >>> 1
>> dst = ~dst & src
>> // If available, prefer a conversion instruction that interprets dst as unsigned.
>> // Otherwise, a correction is needed later (see further down the code).
>> dst = fpToRawBits((fp) dst)
>>
>> // Set xtmp1 = -1 (all one-bits) for later use
>> xtmp1 = -1
>>
>> // Extract the biased exponent
>> xtmp2 = xtmp1 >>> P
>> dst = dst >>> (P - 1)
>> dst = xtmp2 & dst
>>
>> // Compute the exponent
>> // Set xtmp2 = BIAS
>> xtmp2 = xtmp1 >>> (P + 1)
>> dst = dst - xtmp2
>>
>> // Set xtmp2 = W - 1
>> xtmp2 = xtmp1 >>> (W - L)
>>
>> // Adjust for special cases.
>>
>> // We have: src == 0 iff dst < 0
>> // When src == 0, we force the exponent to -1
>> dst = dst >= 0 ? dst : xtmp1 // blend
>>
>> // When src < 0, we force the exponent to W - 1.
>> // This is only needed if the conversion to floating-point above interprets its argument as signed.
>> dst = src >= 0 ? dst : xtmp2 // blend
>>
>> // final result
>> dst = xtmp2 - dst
>
> @rgiulietti Shifting by 1 instead of 24 is a really good idea, it makes showing the validity a lot more simple as you mention. I've applied the suggestion in the latest commit. The updated instruction sequence is also very interesting, I'd like to take a look at it in a followup RFE. I was planning on taking a closer look at the long intrinsic after this patch, since it doesn't use the floating point trick that int does and I was very curious to see what the performance would be like with it.
>
> @TobiHartmann I've pushed an adapted version of your test that checks for `numberOfLeadingZeros`/`numberOfTrailingZeros` correctness for int and long. Let me know what you think!
@jaskarth A real Java implementation of the pseudo-code above has been successfully tested on the whole range of `int` by comparing the outcomes with the standard `Integer.numberOfLeadingZeros()`.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23579#issuecomment-2662435302
More information about the hotspot-compiler-dev
mailing list