Measuring performance changes from applying 4837946 patch

Tue Jun 4 00:27:38 UTC 2013

I have spent some time trying to assess the performance improvements to be accrued by applying the patch

http://cr.openjdk.java.net/~bpb/4837946/

which I previously posted for code review. It is clear from the literature and from analysis of the algorithms involved that significant performance improvements should be obtained as the bit lengths of the factors increase. This also has been verified by other core-libs-dev members in their own testing. I have indeed observed performance improvements but at vastly different algorithm thresholds for the bit lengths involved than those which are currently defined by the thresholds in the updated BigInteger class. I am posting this message as a sort of "sanity test" of my performance evaluation attempts as I think it is more likely that any unusual results are due to my measurements as opposed to the tested code itself.

For performance testing I have thus far exclusively used my regular development platform which is a 2.8 GHz MacBookPro5,3 [1] running Mac OS 10.7.5 (Lion) and upgraded, for what it matters, to 8 GB of RAM and a 7200 RPM HDD.

For micro-benchmarking I have primarily used the Java Microbenchmark Harness (JMH) [2], although I wrote some standalone test code as well. Also, I constrained testing for the most part to the single-threaded case, although relative performance between algorithms was similar for two threads.

I am only going to quote a couple of results here pending feedback of a more general nature about performance testing.

1) Firstly, for single-threaded multiplication of a 1601-bit BigInteger by a 2000-bit BitInteger I am seeing that the Karatsuba algorithm is 60-70% slower than the standard algorithm, and 3-way Toom-Cook is slower still. The 60-70% ratio is consistent between the JMH-based tests and the standalone tests. A bit length of 1600 for both factors is the threshold at which Karatsuba multiplication should kick in.

2) Based on a number of runs with different bit lengths, I did not see the Karatsuba algorithm start to outperform the standard algorithm until bit lengths somewhere between 10400 and 12800. This is about 6.5-8 times the current threshold.

Note that these numbers are based on a modified version of BigInteger I created for testing. This modified version contains three public methods multiply_standard(), multiply_karatsuba(), and multiply_toom_cook(). The first of these is the usual standard algorithm. The second uses the Karatsuba algorithm until EITHER factor is below the threshold, then devolves to the standard algorithm. The third uses the Toom-Cook algorithm until BOTH factors are below the threshold and then uses the standard algorithm. The purpose of using these different methods is to avoid mixing the accelerated algorithms within a single test.

Being quite skeptical of my own results, the questions I have at this point are:

A) Is there some particular approach that should be used in testing these algorithms for relative performance?

B) Could there be something platform-dependent happening?

C) As currently implemented, the relative performance between algorithms should be unaffected by thread count, correct?

Any suggestions would be appreciated.

Thanks,

Brian

[1] http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-2-duo-2.8-aluminum-15-mid-2009-sd-unibody-specs.html
[2] http://openjdk.java.net/projects/code-tools/jmh/