JDK 8 RFR 4891331: BigInteger a.multiply(a) should use squaring code

Thu Oct 24 00:37:44 UTC 2013

Please review this patch at your convenience:

Issue:	https://bugs.openjdk.java.net/browse/JDK-4891331
Webrev:	http://cr.openjdk.java.net/~bpb/4891331/webrev/

This review request follows from this RFC thread:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-October/022266.html

More extensive micro-benchmark testing was performed using JMH [1] for the following cases:

1) multiply: current version of multiply()
2) multiplyNonSquare: proposed version of multiply() with parameter val such that (val != this && val.equals(this)) == true
3) multiplyPowTwo: multiply by val == this with squaring performed via pow(2)
4) multiplySquare: proposed version of multiply() with parameter val such that val == this
5) powTwo: straightforward call of pow(2)
6) square: straightforward call of square()

Test #1 represents current performance of multiplying two identical instances without squaring code.
Test #2 represents performance after the patch when squaring code is *not* used, i.e., measures effect of val == this equality check.
Test #4 represents performance after the patch when squaring code *is* used for the case of mag.length >= 8.
Tests #3, #5, and #6 are for informative purposes.

The following bit lengths were tested:

32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 384, 416, 448, 480, 512, 768, 1024, 1280, 1536.

The following JMH parameters were used across all runs:

# Forks: 5
# Warmup: 5 iterations, 1 s each
# Measurement: 8 iterations, 3 s each
# Benchmark mode: Throughput, ops/time

Two sets of runs were performed on each of two platforms, one single-threaded and one with number of threads equal to the number of hardware cores, 4 in one case (Linux) and 8 in the other (Mac OS X).

Linux system: 4 X Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz.
Mac system: 8 X Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz

Summaries of the aggregate tests results are available at [2], [3], [4], and [5] in terms of 99% confidence intervals in operations per millisecond.

The results suggest that the effect of the equality test (val == this) is insignificant. Significant performance improvement was however observed for squaring of values with int array magnitude length >= 8. The performance gain was up to 150-200% for the largest bit length tested (1536). Larger bit lengths were not tested as those would enter the current range of Karatsuba multiplication (>= 1600 bits).

While not pertinent to this review per se, it should be noted that the performance of pow(2) approaches that of square() as the bit length increases. It might be worth a subsequent investigation to determine whether there is a second threshold beyond which pow(2) overtakes square() altogether.

Results for the same set of tests on a Windows 4 X Core-i7 860 2.8Ghz should be available tomorrow. More limited testing previously performed on Windows produced in relative terms results similar to the those obtained on Linux and Mac OS X.

Lastly, I am not sure that the @implNote is necessary or even desirable here and if so whether a CCC request would be in order. Also note that if there were any doubt as to the exact threshold which ought to be used, then a slightly higher value than 8 could be used or the threshold test could be changed to a strict ">" instead of ">=".

Thanks,

Brian

[1] http://openjdk.java.net/projects/code-tools/jmh/
[2] http://cr.openjdk.java.net/~bpb/4891331/linux-threads-1.html
[3] http://cr.openjdk.java.net/~bpb/4891331/linux-threads-4.html
[4] http://cr.openjdk.java.net/~bpb/4891331/macosx-threads-1.html
[5] http://cr.openjdk.java.net/~bpb/4891331/macosx-threads-8.html