RFR: 8349637: Integer.numberOfLeadingZeros outputs incorrectly in certain cases [v3]

Mon Feb 17 03:44:11 UTC 2025

On Sun, 16 Feb 2025 09:06:20 GMT, Raffaello Giulietti <rgiulietti at openjdk.org> wrote:

>> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Improve explanation of logic
>
> Here's the pseudo-code for an implementation with 13 vector instructions.
> 
> Let `fp` denote `float` or `double`.
> Correspondingly, let `P` = 24, 53 (precision); `L` = 5, 6; `W` = 2^`L` (lane width).
> 
> The code below is pseudo Java and describes `W`-bit lane operations.
> Note that each line corresponds to one vector instruction.
> Further, there's no need for `xtmp3`.
> 
> 
> // Convert src to floating-point.
> // First ensure that the bit to the right of the leading 1, if any, is 0.
> dst = src >>> 1
> dst = ~dst & src
> // If available, prefer a conversion instruction that interprets dst as unsigned.
> // Otherwise, a correction is needed later (see further down the code).
> dst = fpToRawBits((fp) dst)
> 
> // Set xtmp1 = -1 (all one-bits) for later use
> xtmp1 = -1
> 
> // Extract the biased exponent
> xtmp2 = xtmp1 >>> P
> dst = dst >>> (P - 1)
> dst = xtmp2 & dst
> 
> // Compute the exponent
> // Set xtmp2 = BIAS
> xtmp2 = xtmp1 >>> (P + 1)
> dst = dst - xtmp2
> 
> // Set xtmp2 = W - 1
> xtmp2 = xtmp1 >>> (W - L)
> 
> // Adjust for special cases.
> 
> // We have: src == 0 iff dst < 0
> // When src == 0, we force the exponent to -1
> dst = dst >= 0 ? dst : xtmp1  // blend
> 
> // When src < 0, we force the exponent to W - 1.
> // This is only needed if the conversion to floating-point above interprets its argument as signed.
> dst = src >= 0 ? dst : xtmp2  // blend
> 
> // final result
> dst = xtmp2 - dst

@rgiulietti Shifting by 1 instead of 24 is a really good idea, it makes showing the validity a lot more simple as you mention. I've applied the suggestion in the latest commit. The updated instruction sequence is also very interesting, I'd like to take a look at it in a followup RFE. I was planning on taking a closer look at the long intrinsic after this patch, since it doesn't use the floating point trick that int does and I was very curious to see what the performance would be like with it.

@TobiHartmann I've pushed an adapted version of your test that checks for `numberOfLeadingZeros`/`numberOfTrailingZeros` correctness for int and long. Let me know what you think!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23579#issuecomment-2661896820