RFR 8203279 : Faster calculation of power of two

Claes Redestad claes.redestad at oracle.com
Thu May 17 11:07:49 UTC 2018


Shouldn't this be called "Faster rounding up to nearest power of two"?

Patch looks OK to me, but I'd like to see numbers with the 
numberOfLeadingZeros intrinsic
disabled so that we ensure we're not incurring an unreasonable penalty 
on platforms who don't
have an intrinsic for this.

Running your benchmark with the intrinsic disabled[1] on my machine I 
see a 25-30% penalty
with testNew relative to testOld, which is perhaps a bit toomuch for 
comfort..

So I took a look at profiles for numberOfLeadingZeros with the intrinsic 
disabled and realized
it might be possible to improve:

diff -r de35abdfe5da src/java.base/share/classes/java/lang/Integer.java
--- a/src/java.base/share/classes/java/lang/Integer.java        Mon May 
14 16:21:08 2018 +0200
+++ b/src/java.base/share/classes/java/lang/Integer.java        Thu May 
17 12:44:53 2018 +0200
@@ -1621,12 +1621,12 @@
          // HD, Figure 5-6
          if (i <= 0)
              return i == 0 ? 32 : 0;
-        int n = 1;
-        if (i >>> 16 == 0) { n += 16; i <<= 16; }
-        if (i >>> 24 == 0) { n +=  8; i <<=  8; }
-        if (i >>> 28 == 0) { n +=  4; i <<=  4; }
-        if (i >>> 30 == 0) { n +=  2; i <<=  2; }
-        n -= i >>> 31;
+        int n = 0;
+        if (          i < 1 << 16) { n += 16; i <<= 16; }
+        if (i >= 0 && i < 1 << 24) { n +=  8; i <<=  8; }
+        if (i >= 0 && i < 1 << 28) { n +=  4; i <<=  4; }
+        if (i >= 0 && i < 1 << 30) { n +=  2; i <<=  2; }
+        if (i >= 0) n++;
          return n;
      }

Adding a case that uses this version to your benchmark[2] and the new 
version is only about
10% slower than the baseline, with the added benefit that other uses of 
numberOfLeadingZeros
might see a speed-upif there's no intrinsic (runs with intrinsic 
disabled[1]):

Benchmark          (arg)  Mode  Cnt   Score   Error Units
TableFor.testNew       1  avgt    6  28.343 ± 0.534  ns/op
TableFor.testNew      42  avgt    6  26.458 ± 0.064  ns/op
TableFor.testNew2      1  avgt    6  23.969 ± 0.201  ns/op
TableFor.testNew2     42  avgt    6  23.934 ± 0.107  ns/op
TableFor.testOld       1  avgt    6  21.615 ± 0.803  ns/op
TableFor.testOld      42  avgt    6  21.418 ± 0.106  ns/op

So I think with the above patch to Integer.numberOfLeadingZeros we can 
get the benefit for
our supported platforms while staying roughly performance neutral on 
platforms without
an intrinsic. Not strictly necessary to fold it into this RFE, of course.

Thanks!

/Claes

[1] -XX:+UnlockDiagnosticVMOptions 
-XX:DisableIntrinsic=_numberOfLeadingZeros_i
[2] http://cr.openjdk.java.net/~redestad/scratch/TableFor.java

On 2018-05-17 05:48, David Holmes wrote:
>> Do you think it's good to go?
>
> I think I'd rather defer to a more performance oriented reviewer - 
> paging Claes! ;-)
>
> David
> ----- 



More information about the core-libs-dev mailing list