RFR 8203279 : Faster calculation of power of two
Paul Sandoz
paul.sandoz at oracle.com
Thu May 17 15:17:30 UTC 2018
> On May 17, 2018, at 4:07 AM, Claes Redestad <claes.redestad at oracle.com> wrote:
>
> Shouldn't this be called "Faster rounding up to nearest power of two"?
>
> Patch looks OK to me, but I'd like to see numbers with the numberOfLeadingZeros intrinsic
> disabled so that we ensure we're not incurring an unreasonable penalty on platforms who don't
> have an intrinsic for this.
>
I did a quick check and i believe it is intrinsic for C2 on all hotspot cpu platform code, zero does not count! (did not check Graal but i suspect it supports it too).
Extra points for comparing the two approaches with C1 :-) I don’t think that is necessary but i am curious.
Paul.
> Running your benchmark with the intrinsic disabled[1] on my machine I see a 25-30% penalty
> with testNew relative to testOld, which is perhaps a bit toomuch for comfort..
>
> So I took a look at profiles for numberOfLeadingZeros with the intrinsic disabled and realized
> it might be possible to improve:
>
> diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
> --- a/src/java.base/share/classes/java/lang/Integer.java Mon May 14 16:21:08 2018 +0200
> +++ b/src/java.base/share/classes/java/lang/Integer.java Thu May 17 12:44:53 2018 +0200
> @@ -1621,12 +1621,12 @@
> // HD, Figure 5-6
> if (i <= 0)
> return i == 0 ? 32 : 0;
> - int n = 1;
> - if (i >>> 16 == 0) { n += 16; i <<= 16; }
> - if (i >>> 24 == 0) { n += 8; i <<= 8; }
> - if (i >>> 28 == 0) { n += 4; i <<= 4; }
> - if (i >>> 30 == 0) { n += 2; i <<= 2; }
> - n -= i >>> 31;
> + int n = 0;
> + if ( i < 1 << 16) { n += 16; i <<= 16; }
> + if (i >= 0 && i < 1 << 24) { n += 8; i <<= 8; }
> + if (i >= 0 && i < 1 << 28) { n += 4; i <<= 4; }
> + if (i >= 0 && i < 1 << 30) { n += 2; i <<= 2; }
> + if (i >= 0) n++;
> return n;
> }
>
> Adding a case that uses this version to your benchmark[2] and the new version is only about
> 10% slower than the baseline, with the added benefit that other uses of numberOfLeadingZeros
> might see a speed-upif there's no intrinsic (runs with intrinsic disabled[1]):
>
> Benchmark (arg) Mode Cnt Score Error Units
> TableFor.testNew 1 avgt 6 28.343 ± 0.534 ns/op
> TableFor.testNew 42 avgt 6 26.458 ± 0.064 ns/op
> TableFor.testNew2 1 avgt 6 23.969 ± 0.201 ns/op
> TableFor.testNew2 42 avgt 6 23.934 ± 0.107 ns/op
> TableFor.testOld 1 avgt 6 21.615 ± 0.803 ns/op
> TableFor.testOld 42 avgt 6 21.418 ± 0.106 ns/op
>
> So I think with the above patch to Integer.numberOfLeadingZeros we can get the benefit for
> our supported platforms while staying roughly performance neutral on platforms without
> an intrinsic. Not strictly necessary to fold it into this RFE, of course.
>
> Thanks!
>
> /Claes
>
> [1] -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_numberOfLeadingZeros_i
> [2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java
>
> On 2018-05-17 05:48, David Holmes wrote:
>>> Do you think it's good to go?
>>
>> I think I'd rather defer to a more performance oriented reviewer - paging Claes! ;-)
>>
>> David
>> -----
>
More information about the core-libs-dev
mailing list