RFR: 8349637: Integer.numberOfLeadingZeros outputs incorrectly in certain cases [v3]

Raffaello Giulietti rgiulietti at openjdk.org
Thu Feb 20 16:10:54 UTC 2025


On Mon, 17 Feb 2025 03:41:11 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> Here's the pseudo-code for an implementation with 13 vector instructions.
>> 
>> Let `fp` denote `float` or `double`.
>> Correspondingly, let `P` = 24, 53 (precision); `L` = 5, 6; `W` = 2^`L` (lane width).
>> 
>> The code below is pseudo Java and describes `W`-bit lane operations.
>> Note that each line corresponds to one vector instruction.
>> Further, there's no need for `xtmp3`.
>> 
>> 
>> // Convert src to floating-point.
>> // First ensure that the bit to the right of the leading 1, if any, is 0.
>> dst = src >>> 1
>> dst = ~dst & src
>> // If available, prefer a conversion instruction that interprets dst as unsigned.
>> // Otherwise, a correction is needed later (see further down the code).
>> dst = fpToRawBits((fp) dst)
>> 
>> // Set xtmp1 = -1 (all one-bits) for later use
>> xtmp1 = -1
>> 
>> // Extract the biased exponent
>> xtmp2 = xtmp1 >>> P
>> dst = dst >>> (P - 1)
>> dst = xtmp2 & dst
>> 
>> // Compute the exponent
>> // Set xtmp2 = BIAS
>> xtmp2 = xtmp1 >>> (P + 1)
>> dst = dst - xtmp2
>> 
>> // Set xtmp2 = W - 1
>> xtmp2 = xtmp1 >>> (W - L)
>> 
>> // Adjust for special cases.
>> 
>> // We have: src == 0 iff dst < 0
>> // When src == 0, we force the exponent to -1
>> dst = dst >= 0 ? dst : xtmp1  // blend
>> 
>> // When src < 0, we force the exponent to W - 1.
>> // This is only needed if the conversion to floating-point above interprets its argument as signed.
>> dst = src >= 0 ? dst : xtmp2  // blend
>> 
>> // final result
>> dst = xtmp2 - dst
>
> @rgiulietti Shifting by 1 instead of 24 is a really good idea, it makes showing the validity a lot more simple as you mention. I've applied the suggestion in the latest commit. The updated instruction sequence is also very interesting, I'd like to take a look at it in a followup RFE. I was planning on taking a closer look at the long intrinsic after this patch, since it doesn't use the floating point trick that int does and I was very curious to see what the performance would be like with it.
> 
> @TobiHartmann I've pushed an adapted version of your test that checks for `numberOfLeadingZeros`/`numberOfTrailingZeros` correctness for int and long. Let me know what you think!

@jaskarth I think the special handling of max_int is useless, but you may want to integrate now and remove this handling in a followup PR.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23579#issuecomment-2671958988


More information about the hotspot-compiler-dev mailing list