BigInteger performance improvements
I'm planning on tackling the performance issues in the BigInteger class. In short, inefficient algorithms are used for multiplication, exponentiation, conversion to strings, etc. I intend to improve this by adding algorithms with better asymptotic behavior that will work better for large numbers, while preserving the existing algorithms for use with smaller numbers. This encompasses a lot of different bug reports: 4228681: Some BigInteger operations are slow with very large numbers http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4228681 (This was closed but never fixed.) 4837946: Implement Karatsuba multiplication algorithm in BigInteger http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4837946 I've already done the work on this one. My implementation is intended to be easy to read, understand, and check. It significantly improves multiplication performance for large numbers. 4646474: BigInteger.pow() algorithm slow in 1.4.0 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4646474 This will be improved in a couple ways: * Rewrite pow() to use the above Karatsuba multiplication * Implement Karatsuba squaring * Finding a good threshhold for Karatsuba squaring * Rewrite pow() to use Karatsuba squaring * Add an optimization to use left-shifting for multiples of 2 in the base. This improves speed by thousands of times for things like Mersenne numbers. 4641897: BigInteger.toString() algorithm slow for large numbers http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4641897 This algorithm uses a very inefficient algorithm for large numbers. I plan to replace it with a recursive divide-and-conquer algorithm devised by Schoenhage and Strassen. I have developed and tested this in my own software. This operates hundreds or thousands of times faster than the current version for large numbers. It will also benefit from faster multiplication and exponentiation. In the future, we should also add multiplication routines that are even more efficient for very large numbers, such as Toom-Cook multiplication, which is more efficient than Karatsuba multiplication for even larger numbers. Has anyone else worked on these? Is this the right group? I will probably submit the Karatsuba multiplication patch soon. Would it be more preferable to implement *all* of these parts first and submit one large patch? -- Alan Eliasen | "Furious activity is no substitute eliasen@mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams
Hello. Yes, this is the right group :-) As "Java Floating-Point Czar" I also look after BigInteger, although I haven't had much time for proactive maintenance there in a while. I think using faster, more mathematically sophisticated algorithms in BigInteger for large values would be a fine change to explore, as long as the performance on small values was preserved and regression tests appropriate for the new algorithms were developed too. I'd prefer to process changes in smaller chunks rather a huge one all at once; however, I may be a bit slow on reviews in the near future due to some other openjdk work I'm leading up (http://openjdk.java.net/projects/jdk6/). -Joe Darcy Alan Eliasen wrote:
I'm planning on tackling the performance issues in the BigInteger class. In short, inefficient algorithms are used for multiplication, exponentiation, conversion to strings, etc. I intend to improve this by adding algorithms with better asymptotic behavior that will work better for large numbers, while preserving the existing algorithms for use with smaller numbers.
This encompasses a lot of different bug reports:
4228681: Some BigInteger operations are slow with very large numbers http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4228681
(This was closed but never fixed.)
4837946: Implement Karatsuba multiplication algorithm in BigInteger http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4837946
I've already done the work on this one. My implementation is intended to be easy to read, understand, and check. It significantly improves multiplication performance for large numbers.
4646474: BigInteger.pow() algorithm slow in 1.4.0 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4646474
This will be improved in a couple ways:
* Rewrite pow() to use the above Karatsuba multiplication * Implement Karatsuba squaring * Finding a good threshhold for Karatsuba squaring * Rewrite pow() to use Karatsuba squaring * Add an optimization to use left-shifting for multiples of 2 in the base. This improves speed by thousands of times for things like Mersenne numbers.
4641897: BigInteger.toString() algorithm slow for large numbers http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4641897
This algorithm uses a very inefficient algorithm for large numbers. I plan to replace it with a recursive divide-and-conquer algorithm devised by Schoenhage and Strassen. I have developed and tested this in my own software. This operates hundreds or thousands of times faster than the current version for large numbers. It will also benefit from faster multiplication and exponentiation.
In the future, we should also add multiplication routines that are even more efficient for very large numbers, such as Toom-Cook multiplication, which is more efficient than Karatsuba multiplication for even larger numbers.
Has anyone else worked on these? Is this the right group?
I will probably submit the Karatsuba multiplication patch soon. Would it be more preferable to implement *all* of these parts first and submit one large patch?
Joseph D. Darcy wrote:
Yes, this is the right group :-) As "Java Floating-Point Czar" I also look after BigInteger, although I haven't had much time for proactive maintenance there in a while. I think using faster, more mathematically sophisticated algorithms in BigInteger for large values would be a fine change to explore, as long as the performance on small values was preserved and regression tests appropriate for the new algorithms were developed too.
My last step for this patch is improving performance of pow() for small numbers, which got slightly slower for some small arguments. (But some arguments, like those containing powers of 2 in the base, are *much* faster.) The other functions like multiply() are basically unchanged for small numbers. The new code, as you might expect, examines the size of the numbers and runs the old "grade-school" algorithm on small numbers and the Karatsuba algorithm on larger numbers (with the threshhold point being determined by benchmark and experiment.) As to the matter of regression tests, I would presume that Sun already has regression tests for BigInteger to make sure it gets correct results. Can you provide me with these, or are they in the OpenJDK distribution already? I can check these to make sure that they use numbers big enough to trigger the threshholds, and if not, extend them in the same style. Regression tests should hopefully be a no-op, assuming you have some already. I'm not adding any new functionality and no results should change--they should just run faster, of course. It should of course be impossible to write a regression test that "succeeds with the patch applied, and fails without the patch" like is requested on the OpenJDK page, unless it is time-limited. Of course, I could extend the regression tests to test *huge* numbers which may or may not be desired if you want the regression tests to run in short time. For example, a single exponentiation that takes milliseconds in my new version takes on the order of 15 minutes with JDK 1.6. How long are you willing to let regression tests take? How many combinations of arguments do you currently test? Do you think more are necessary?
I'd prefer to process changes in smaller chunks rather a huge one all at once; however, I may be a bit slow on reviews in the near future due to some other openjdk work I'm leading up
I will be submitting a patch addressing the three functions: multiply(), pow() and (the private) square(). These are intertwined and it would be more work and testing for both of us to separate out the patches and apply them in 3 phases. The patch will definitely not be huge. My patches are designed to be as readable and simple as possible for this phase. They all build on existing functions, and eschew lots of low-level bit-fiddling, as those types of changes are harder to understand and debug. I think it's best to get working algorithms with better asymptotic efficiency, as those will vastly improve performance for large numbers, and tune them by doing more low-level bit fiddling later. Even without being tuned to the nth degree, the new algorithms are vastly faster for large numbers, and identical for small numbers. -- Alan Eliasen | "Furious activity is no substitute eliasen@mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams
Alan Eliasen wrote:
Joseph D. Darcy wrote:
Yes, this is the right group :-) As "Java Floating-Point Czar" I also look after BigInteger, although I haven't had much time for proactive maintenance there in a while. I think using faster, more mathematically sophisticated algorithms in BigInteger for large values would be a fine change to explore, as long as the performance on small values was preserved and regression tests appropriate for the new algorithms were developed too.
My last step for this patch is improving performance of pow() for small numbers, which got slightly slower for some small arguments. (But some arguments, like those containing powers of 2 in the base, are *much* faster.) The other functions like multiply() are basically unchanged for small numbers. The new code, as you might expect, examines the size of the numbers and runs the old "grade-school" algorithm on small numbers and the Karatsuba algorithm on larger numbers (with the threshhold point being determined by benchmark and experiment.)
As to the matter of regression tests, I would presume that Sun already has regression tests for BigInteger to make sure it gets correct results. Can you provide me with these, or are they in the OpenJDK distribution already?
Let's see, I haven't moved the existing tests over from the closed world to open, but I can do that after the repositories are accepting changes again.
I can check these to make sure that they use numbers big enough to trigger the threshholds, and if not, extend them in the same style.
A general comment is that BigInteger and BigDecimal are rather old classes and our expectations for regression tests have increased over time, which is to say there are fewer regression tests than if the classes were developed today. For my own numerical work, a very large fraction of my engineering time is now spent developing tests rather than code.
Regression tests should hopefully be a no-op, assuming you have some already. I'm not adding any new functionality and no results should change--they should just run faster, of course.
While all the existing tests should still pass, that doesn't necessarily imply that no new tests should be written :-) Especially for numerics, the tests need to probe the algorithms where they are likely to fail, and the likely failure points can change with the algorithm. Taking one notorious example, the Pentuim FDIV instruction passed existing divide tests and ran billions of random tests successfully, but (after the fact) a small program targeted at probing interesting SRT boundaries was able to find an indication of a problem after running for only a fraction of a second: http://www.cs.berkeley.edu/~wkahan/srtest
It should of course be impossible to write a regression test that "succeeds with the patch applied, and fails without the patch" like is requested on the OpenJDK page, unless it is time-limited.
Of course, I could extend the regression tests to test *huge* numbers which may or may not be desired if you want the regression tests to run in short time. For example, a single exponentiation that takes milliseconds in my new version takes on the order of 15 minutes with JDK 1.6. How long are you willing to let regression tests take? How many combinations of arguments do you currently test? Do you think more are necessary?
I'm confident the existing tests will need to be augmented; I can work with you developing them.
I'd prefer to process changes in smaller chunks rather a huge one all at once; however, I may be a bit slow on reviews in the near future due to some other openjdk work I'm leading up
I will be submitting a patch addressing the three functions: multiply(), pow() and (the private) square(). These are intertwined and it would be more work and testing for both of us to separate out the patches and apply them in 3 phases. The patch will definitely not be huge.
Yes, that sounds like a good bundle of initial changes.
My patches are designed to be as readable and simple as possible for this phase. They all build on existing functions, and eschew lots of low-level bit-fiddling, as those types of changes are harder to understand and debug. I think it's best to get working algorithms with better asymptotic efficiency, as those will vastly improve performance for large numbers, and tune them by doing more low-level bit fiddling later. Even without being tuned to the nth degree, the new algorithms are vastly faster for large numbers, and identical for small numbers
Regards, -Joe
Attached is a patch for bug 4837946, for implementing asymptotically faster algorithms for multiplication of large numbers in the BigInteger class: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4837946 This patch implements Karatsuba multiplication and Karatsuba squaring for numbers above a certain size (found by experimentation). These patches are designed to be as easy to read as possible, and are implemented in terms of already-existing methods in BigInteger. Some more performance might be squeezed out of them by doing more low-level bit-fiddling, but I wanted to get a working version in and tested. This is quite a bit faster for large arguments. The crossover point between the "grade-school" algorithm and the Karatsuba algorithm for multiplication is set at 35 "ints" or about 1120 bits, which was found to give the best crossover in timing tests, so you won't see improvement below that. Above that, it's asymptotically faster. (O(n^1.585), compared to O(n^2)) for the grade-school algorithm. Double the number of digits, and the "grade school" algorithm takes about 4 times as long, Karatsuba takes about 3 times as long. It's vastly superior for very large arguments. I'd also like to create another RFE for implementing even faster multiplication algorithms for yet larger numbers, such as Toom-Cook. Previously, I had indicated that I'd submit faster algorithms for pow() at the same time, but the number of optimizations for pow() has grown rather large, and I plan on working on it a bit more and submitting it separately. Many/most combinations of operands for pow() are now vastly faster (some hundreds of thousands of times,) but I'd like to make it faster (or, at the least, the same performance for all arguments, a few of which have gotten slightly slower.) Unfortunately, these optimizations add to the size and complexity of that patch, which is why I'm submitting it separately. I have created regression tests that may or may not be what you want; they simply multiply a very large bunch of numbers together and output their results to a very large file, which you then "diff" against known correct values. (My tests produce 345 MB of output per run!) I validated the results by comparing them to the output of both JDK 1.6 and the Kaffe JVM, which was compiled to use the well-known and widely-tested GMP libraries for its BigInteger work. All tests pass. I haven't submitted these tests, but am awaiting getting a copy of the existing regression tests that Joseph Darcy discussed on this list. Please let me know if there's a problem with the patch. I had to hand-edit a few lines to remove the work I'm doing for pow(). -- Alan Eliasen | "Furious activity is no substitute eliasen@mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams diff --git a/src/share/classes/java/math/BigInteger.java b/src/share/classes/java/math/BigInteger.java --- a/src/share/classes/java/math/BigInteger.java +++ b/src/share/classes/java/math/BigInteger.java @@ -172,6 +172,22 @@ public class BigInteger extends Number i */ private final static long LONG_MASK = 0xffffffffL; + /** + * The threshhold value for using Karatsuba multiplication. If the number + * of ints in both mag arrays are greater than this number, then + * Karatsuba multiplication will be used. This value is found + * experimentally to work well. + */ + private static final int KARATSUBA_THRESHHOLD = 35; + + /** + * The threshhold value for using Karatsuba squaring. If the number + * of ints in the number are larger than this value, + * Karatsuba squaring will be used. This value is found + * experimentally to work well. + */ + private static final int KARATSUBA_SQUARE_THRESHHOLD = 100; + //Constructors /** @@ -1043,6 +1059,11 @@ public class BigInteger extends Number i private static final BigInteger TWO = valueOf(2); /** + * The BigInteger constant -1. (Not exported.) + */ + private static final BigInteger NEGATIVE_ONE = valueOf(-1); + + /** * The BigInteger constant ten. * * @since 1.5 @@ -1188,10 +1209,19 @@ public class BigInteger extends Number i if (val.signum == 0 || signum == 0) return ZERO; - int[] result = multiplyToLen(mag, mag.length, - val.mag, val.mag.length, null); - result = trustedStripLeadingZeroInts(result); - return new BigInteger(result, signum*val.signum); + int xlen = mag.length; + int ylen = val.mag.length; + + if ((xlen < KARATSUBA_THRESHHOLD) || (ylen < KARATSUBA_THRESHHOLD)) + { + + int[] result = multiplyToLen(mag, xlen, + val.mag, ylen, null); + result = trustedStripLeadingZeroInts(result); + return new BigInteger(result, signum*val.signum); + } + else + return multiplyKaratsuba(this, val); } /** @@ -1229,6 +1259,82 @@ public class BigInteger extends Number i } /** + * Multiplies two BigIntegers using the Karatsuba multiplication + * algorithm. This is a recursive divide-and-conquer algorithm which + * is more efficient for large numbers than what is commonly called the + * "grade-school" algorithm used in multiplyToLen. If the numbers to + * be multiplied have length n, the "grade-school" algorithm has + * an asymptotic complexity of O(n^2). In contrast, the Karatsuba + * algorithm has complexity of O(n^(log2(3))), or O(n^1.585). It achieves + * this increased performance by doing 3 multiplies instead of 4 when + * evaluating the product. As it has some overhead, should be used when + * both numbers are larger than a certain threshhold (found + * experimentally). + * + */ + private static BigInteger multiplyKaratsuba(BigInteger x, BigInteger y) + { + int xlen = x.mag.length; + int ylen = y.mag.length; + + // The number of ints in each half of the number. + int half = Math.max(xlen, ylen) / 2; + + BigInteger xl = x.getLower(half); + BigInteger xh = x.getUpper(half); + BigInteger yl = y.getLower(half); + BigInteger yh = y.getUpper(half); + + BigInteger p1 = xh.multiply(yh); + BigInteger p2 = xl.multiply(yl); + + // The following is (xh+xl)*(yh+yl) + BigInteger p3 = xh.add(xl).multiply(yh.add(yl)); + + // p1 * 2^(32*2*half) + (p3 - p1 - p2) * 2^(32*half) + p2 + BigInteger result = p1.shiftLeft(32*half).add(p3.subtract(p1).subtract(p2)).shiftLeft(32*half).add(p2); + + if (x.signum * y.signum == -1) + return result.negate(); + else + return result; + } + + /** + * Returns a new BigInteger representing n lower ints of the number. + * This is used by Karatsuba multiplication and squaring. + */ + private BigInteger getLower(int n) { + int len = mag.length; + + if (len <= n) + return this; + + int lowerInts[] = new int[n]; + System.arraycopy(mag, len-n, lowerInts, 0, n); + + return new BigInteger(lowerInts, 1); + } + + /** + * Returns a new BigInteger representing mag.length-n upper + * ints of the number. This is used by Karatsuba multiplication and + * squaring. + */ + private BigInteger getUpper(int n) { + int len = mag.length; + + if (len <= n) + return ZERO; + + int upperLen = len - n; + int lowerInts[] = new int[upperLen]; + System.arraycopy(mag, 0, lowerInts, 0, upperLen); + + return new BigInteger(lowerInts, 1); + } + + /** * Returns a BigInteger whose value is {@code (this<sup>2</sup>)}. * * @return {@code this<sup>2</sup>} @@ -1236,8 +1342,14 @@ public class BigInteger extends Number i private BigInteger square() { if (signum == 0) return ZERO; - int[] z = squareToLen(mag, mag.length, null); - return new BigInteger(trustedStripLeadingZeroInts(z), 1); + int len = mag.length; + if (len < KARATSUBA_SQUARE_THRESHHOLD) + { + int[] z = squareToLen(mag, len, null); + return new BigInteger(trustedStripLeadingZeroInts(z), 1); + } + else + return squareKaratsuba(); } /** @@ -1308,10 +1420,31 @@ public class BigInteger extends Number i } /** + * Squares a BigInteger using the Karatsuba squaring algorithm. + * It should be used when both numbers are larger than a + * certain threshhold (found experimentally). + * It is a recursive divide-and-conquer algorithm that has better + * asymptotic performance than the algorithm used in squareToLen. + */ + private BigInteger squareKaratsuba() + { + int half = mag.length / 2; + + BigInteger xl = getLower(half); + BigInteger xh = getUpper(half); + + BigInteger xhs = xh.square(); + BigInteger xls = xl.square(); + + // xh^2 << 64 + (((xl+xh)^2 - (xh^2 + xl^2)) << 32) + xl^2 + return xhs.shiftLeft(half*32).add(xl.add(xh).square().subtract(xhs.add(xls))).shiftLeft(half*32).add(xls); + } + + /** * Returns a BigInteger whose value is {@code (this / val)}. * * @param val value by which this BigInteger is to be divided. * @return {@code this / val} * @throws ArithmeticException {@code val==0} */ public BigInteger divide(BigInteger val) {
Attached is a *revised* patch for bug 4837946, for implementing asymptotically faster algorithms for multiplication of large numbers in the BigInteger class: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4837946 It only differs from my patch posted yesterday in one respect--the new methods getLower() and getUpper() now call trustedStripLeadingZeroInts() which has two effects: * Helps preserve the invariant expected by BigInteger that zero has a signum of zero and a mag array with length of zero (due to the way that the private constructor BigInteger(int[] mag, int signum) works.) * Optimizes some cases like multiplying 2*1000000, where the bit string has a large number of zeroes in a row. Please use this patch instead of the one I posted yesterday. -- Alan Eliasen | "Furious activity is no substitute eliasen@mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams diff --git a/src/share/classes/java/math/BigInteger.java b/src/share/classes/java/math/BigInteger.java --- a/src/share/classes/java/math/BigInteger.java +++ b/src/share/classes/java/math/BigInteger.java @@ -172,6 +172,22 @@ public class BigInteger extends Number i */ private final static long LONG_MASK = 0xffffffffL; + /** + * The threshhold value for using Karatsuba multiplication. If the number + * of ints in both mag arrays are greater than this number, then + * Karatsuba multiplication will be used. This value is found + * experimentally to work well. + */ + private static final int KARATSUBA_THRESHHOLD = 35; + + /** + * The threshhold value for using Karatsuba squaring. If the number + * of ints in the number are larger than this value, + * Karatsuba squaring will be used. This value is found + * experimentally to work well. + */ + private static final int KARATSUBA_SQUARE_THRESHHOLD = 100; + //Constructors /** @@ -530,7 +546,8 @@ public class BigInteger extends Number i if (bitLength < 2) throw new ArithmeticException("bitLength < 2"); // The cutoff of 95 was chosen empirically for best performance - prime = (bitLength < 95 ? smallPrime(bitLength, certainty, rnd) + prime = (bitLength < SMALL_PRIME_THRESHOLD + ? smallPrime(bitLength, certainty, rnd) : largePrime(bitLength, certainty, rnd)); signum = 1; mag = prime.mag; @@ -1043,6 +1060,11 @@ public class BigInteger extends Number i private static final BigInteger TWO = valueOf(2); /** + * The BigInteger constant -1. (Not exported.) + */ + private static final BigInteger NEGATIVE_ONE = valueOf(-1); + + /** * The BigInteger constant ten. * * @since 1.5 @@ -1188,10 +1210,19 @@ public class BigInteger extends Number i if (val.signum == 0 || signum == 0) return ZERO; - int[] result = multiplyToLen(mag, mag.length, - val.mag, val.mag.length, null); - result = trustedStripLeadingZeroInts(result); - return new BigInteger(result, signum*val.signum); + int xlen = mag.length; + int ylen = val.mag.length; + + if ((xlen < KARATSUBA_THRESHHOLD) || (ylen < KARATSUBA_THRESHHOLD)) + { + + int[] result = multiplyToLen(mag, xlen, + val.mag, ylen, null); + result = trustedStripLeadingZeroInts(result); + return new BigInteger(result, signum*val.signum); + } + else + return multiplyKaratsuba(this, val); } /** @@ -1229,6 +1260,82 @@ public class BigInteger extends Number i } /** + * Multiplies two BigIntegers using the Karatsuba multiplication + * algorithm. This is a recursive divide-and-conquer algorithm which + * is more efficient for large numbers than what is commonly called the + * "grade-school" algorithm used in multiplyToLen. If the numbers to + * be multiplied have length n, the "grade-school" algorithm has + * an asymptotic complexity of O(n^2). In contrast, the Karatsuba + * algorithm has complexity of O(n^(log2(3))), or O(n^1.585). It achieves + * this increased performance by doing 3 multiplies instead of 4 when + * evaluating the product. As it has some overhead, should be used when + * both numbers are larger than a certain threshhold (found + * experimentally). + * + */ + private static BigInteger multiplyKaratsuba(BigInteger x, BigInteger y) + { + int xlen = x.mag.length; + int ylen = y.mag.length; + + // The number of ints in each half of the number. + int half = Math.max(xlen, ylen) / 2; + + BigInteger xl = x.getLower(half); + BigInteger xh = x.getUpper(half); + BigInteger yl = y.getLower(half); + BigInteger yh = y.getUpper(half); + + BigInteger p1 = xh.multiply(yh); + BigInteger p2 = xl.multiply(yl); + + // The following is (xh+xl)*(yh+yl) + BigInteger p3 = xh.add(xl).multiply(yh.add(yl)); + + // p1 * 2^(32*2*half) + (p3 - p1 - p2) * 2^(32*half) + p2 + BigInteger result = p1.shiftLeft(32*half).add(p3.subtract(p1).subtract(p2)).shiftLeft(32*half).add(p2); + + if (x.signum * y.signum == -1) + return result.negate(); + else + return result; + } + + /** + * Returns a new BigInteger representing n lower ints of the number. + * This is used by Karatsuba multiplication and squaring. + */ + private BigInteger getLower(int n) { + int len = mag.length; + + if (len <= n) + return this; + + int lowerInts[] = new int[n]; + System.arraycopy(mag, len-n, lowerInts, 0, n); + + return new BigInteger(trustedStripLeadingZeroInts(lowerInts), 1); + } + + /** + * Returns a new BigInteger representing mag.length-n upper + * ints of the number. This is used by Karatsuba multiplication and + * squaring. + */ + private BigInteger getUpper(int n) { + int len = mag.length; + + if (len <= n) + return ZERO; + + int upperLen = len - n; + int upperInts[] = new int[upperLen]; + System.arraycopy(mag, 0, upperInts, 0, upperLen); + + return new BigInteger(trustedStripLeadingZeroInts(upperInts), 1); + } + + /** * Returns a BigInteger whose value is {@code (this<sup>2</sup>)}. * * @return {@code this<sup>2</sup>} @@ -1236,8 +1343,14 @@ public class BigInteger extends Number i private BigInteger square() { if (signum == 0) return ZERO; - int[] z = squareToLen(mag, mag.length, null); - return new BigInteger(trustedStripLeadingZeroInts(z), 1); + int len = mag.length; + if (len < KARATSUBA_SQUARE_THRESHHOLD) + { + int[] z = squareToLen(mag, len, null); + return new BigInteger(trustedStripLeadingZeroInts(z), 1); + } + else + return squareKaratsuba(); } /** @@ -1308,10 +1421,32 @@ public class BigInteger extends Number i } /** + * Squares a BigInteger using the Karatsuba squaring algorithm. + * It should be used when both numbers are larger than a + * certain threshhold (found experimentally). + * It is a recursive divide-and-conquer algorithm that has better + * asymptotic performance than the algorithm used in squareToLen. + */ + private BigInteger squareKaratsuba() + { + int half = mag.length / 2; + + BigInteger xl = getLower(half); + BigInteger xh = getUpper(half); + + BigInteger xhs = xh.square(); + BigInteger xls = xl.square(); + + // xh^2 << 64 + (((xl+xh)^2 - (xh^2 + xl^2)) << 32) + xl^2 + return xhs.shiftLeft(half*32).add(xl.add(xh).square().subtract(xhs.add(xls))).shiftLeft(half*32).add(xls); + } + + /** * Returns a BigInteger whose value is {@code (this / val)}. * * @param val value by which this BigInteger is to be divided. * @return {@code this / val} + * @throws ArithmeticException {@code val==0} */ public BigInteger divide(BigInteger val) {
Attached is the code for regression test for multiply() and pow() in BigInteger. It produces a lot of text (approximately 345 MB) to standard output. You will need to save this output and diff it against known good results. I have provided a file of known good results tested against a variety of platforms, including the Kaffe JVM using the GMP libraries for BigInteger. It tests a lot of different size arguments, including arguments around many powers of 2, which are the places that one would expect problems to occur. The known good results are available at: http://futureboy.us/temp/BigIntegerGood.txt.bz2 This is a 56 MB download compressed with bzip2. I'd appreciate it if others could test this on other platforms. I don't anticipate any problems, though. I haven't had much time to finish the changes to the pow() function, but there are some more optimizations I'd like to put in. -- Alan Eliasen | "Furious activity is no substitute eliasen@mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams import java.math.BigInteger; public class BigIntegerRegressionTest { public static void main(String args[]) { BigInteger exp,b,c,r,r1,r2,bb2; for (int ee=0; ee<=1000; ee++) { for (int bb=-10; bb <= 10; bb++) { b = BigInteger.valueOf(bb); r = b.pow(ee); System.out.println("(" + bb + ")" + "^" + ee + "=" + r); } } for (int bb=-10; bb <= 10; bb++) { b = BigInteger.valueOf(bb); for (int ee=0; ee<=100; ee++) { r1 = b.pow(ee); for (int b2=-10; b2 <= 10; b2++) { bb2 = BigInteger.valueOf(b2); for (int e2=0; e2<100; e2++) { r2 = bb2.pow(e2); r = r1.multiply(r2); System.out.println("(" + bb + ")" + "^" + ee + " * (" + b2 + ")" + "^" + e2 + "=" + r); } } } } } }
* Alan Eliasen:
Attached is the code for regression test for multiply() and pow() in BigInteger. It produces a lot of text (approximately 345 MB) to standard output. You will need to save this output and diff it against known good results. I have provided a file of known good results tested against a variety of platforms, including the Kaffe JVM using the GMP libraries for BigInteger.
Can't you put MD5 hashes of the known-good results in the test? This way, you only need the 345 MB if something goes wrong.
Attached is an UPDATED patch for bug 4837946, for implementing asymptotically faster algorithms for multiplication of large numbers in the BigInteger class: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4837946 The difference between this patch and the one posted earlier are: * This patch now also implements 3-way Toom-Cook multiplication and 3-way Toom-Cook squaring in addition to the Karatsuba squaring. 3-way Toom-Cook multiplication has an asymptotic efficiency of O(n^1.465), compared to O(n^1.585) for Karatsuba and O(n^2) for the "grade-school" algorithm, making it significantly faster for very large numbers. * Thresholds for various multiplications were slightly modified for better performance. The lower limit for Karatsuba multiplication is now 45 ints (1440 bits), and the lower limit for 3-way Toom-Cook multiplication is now 85 ints (2720 bits). * Threshholds for squaring were changed. Karatsuba squaring is used at 100 ints (3200 bits) and 3-way Toom-Cook squaring is at 190 ints (6080 bits.) This has been extensively tested. My regression tests are probably significant overkill for Sun's purposes. They take about 30 hours to run and produce about 232 gigabytes of output. (Multiply each of those numbers by 2 because you have to run it twice to compare the outputs of different VMs, and then compare output.) Again, my question is: how much time and space is acceptable for regression tests? I posed this before, but didn't get an answer. This patch contains only multiplication- and squaring-related patches. I will be submitting another patch of my changes to make the pow() function very much faster, but that will be a separate patch. For those who'd rather just replace their BigInteger with my much faster version, that also contains my patches for pow(), it's available at: http://futureboy.us/temp/BigInteger.java Note that my version of pow() will probably change somewhat before submission, but this current version works perfectly and is faster for most arguments. It's gotten slightly slower for some mid-size arguments, which is what I will be fixing. Previous patch report follows:
This patch implements Karatsuba multiplication and Karatsuba squaring for numbers above a certain size (found by experimentation). These patches are designed to be as easy to read as possible, and are implemented in terms of already-existing methods in BigInteger. Some more performance might be squeezed out of them by doing more low-level bit-fiddling, but I wanted to get a working version in and tested.
This is quite a bit faster for large arguments. The crossover point between the "grade-school" algorithm and the Karatsuba algorithm for multiplication is set at 35 "ints" or about 1120 bits, which was found to give the best crossover in timing tests, so you won't see improvement below that. Above that, it's asymptotically faster. (O(n^1.585), compared to O(n^2)) for the grade-school algorithm. Double the number of digits, and the "grade school" algorithm takes about 4 times as long, Karatsuba takes about 3 times as long. It's vastly superior for very large arguments.
I'd also like to create another RFE for implementing even faster multiplication algorithms for yet larger numbers, such as Toom-Cook.
Previously, I had indicated that I'd submit faster algorithms for pow() at the same time, but the number of optimizations for pow() has grown rather large, and I plan on working on it a bit more and submitting it separately. Many/most combinations of operands for pow() are now vastly faster (some hundreds of thousands of times,) but I'd like to make it faster (or, at the least, the same performance for all arguments, a few of which have gotten slightly slower.) Unfortunately, these optimizations add to the size and complexity of that patch, which is why I'm submitting it separately.
I have created regression tests that may or may not be what you want; they simply multiply a very large bunch of numbers together and output their results to a very large file, which you then "diff" against known correct values. (My tests produce 345 MB of output per run!) I validated the results by comparing them to the output of both JDK 1.6 and the Kaffe JVM, which was compiled to use the well-known and widely-tested GMP libraries for its BigInteger work. All tests pass. I haven't submitted these tests, but am awaiting getting a copy of the existing regression tests that Joseph Darcy discussed on this list.
Please let me know if there's a problem with the patch. I had to hand-edit a few lines to remove the work I'm doing for pow().
-- Alan Eliasen | "Furious activity is no substitute eliasen@mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams diff -r 37a05a11f281 src/share/classes/java/math/BigInteger.java --- a/src/share/classes/java/math/BigInteger.java Sat Dec 01 00:00:00 2007 +0000 +++ b/src/share/classes/java/math/BigInteger.java Thu Mar 27 22:29:34 2008 -0600 @@ -172,6 +172,38 @@ public class BigInteger extends Number i */ private final static long LONG_MASK = 0xffffffffL; + /** + * The threshold value for using Karatsuba multiplication. If the number + * of ints in both mag arrays are greater than this number, then + * Karatsuba multiplication will be used. This value is found + * experimentally to work well. + */ + public static int KARATSUBA_THRESHOLD = 45; + + /** + * The threshold value for using 3-way Toom-Cook multiplication. + * If the number of ints in both mag arrays are greater than this number, + * then Toom-Cook multiplication will be used. This value is found + * experimentally to work well. + */ + public static int TOOM_COOK_THRESHOLD = 85; + + /** + * The threshold value for using Karatsuba squaring. If the number + * of ints in the number are larger than this value, + * Karatsuba squaring will be used. This value is found + * experimentally to work well. + */ + public static int KARATSUBA_SQUARE_THRESHOLD = 100; + + /** + * The threshold value for using Toom-Cook squaring. If the number + * of ints in the number are larger than this value, + * Karatsuba squaring will be used. This value is found + * experimentally to work well. + */ + public static int TOOM_COOK_SQUARE_THRESHOLD = 190; + //Constructors /** @@ -530,7 +562,8 @@ public class BigInteger extends Number i if (bitLength < 2) throw new ArithmeticException("bitLength < 2"); // The cutoff of 95 was chosen empirically for best performance - prime = (bitLength < 95 ? smallPrime(bitLength, certainty, rnd) + prime = (bitLength < SMALL_PRIME_THRESHOLD + ? smallPrime(bitLength, certainty, rnd) : largePrime(bitLength, certainty, rnd)); signum = 1; mag = prime.mag; @@ -1043,6 +1076,16 @@ public class BigInteger extends Number i private static final BigInteger TWO = valueOf(2); /** + * The BigInteger constant -1. (Not exported.) + */ + private static final BigInteger NEGATIVE_ONE = valueOf(-1); + + /** + * The BigInteger constant 3. (Not exported.) + */ + private static final BigInteger THREE = valueOf(3); + + /** * The BigInteger constant ten. * * @since 1.5 @@ -1188,10 +1234,21 @@ public class BigInteger extends Number i if (val.signum == 0 || signum == 0) return ZERO; - int[] result = multiplyToLen(mag, mag.length, - val.mag, val.mag.length, null); - result = trustedStripLeadingZeroInts(result); - return new BigInteger(result, signum*val.signum); + int xlen = mag.length; + int ylen = val.mag.length; + + if ((xlen < KARATSUBA_THRESHOLD) || (ylen < KARATSUBA_THRESHOLD)) + { + int[] result = multiplyToLen(mag, xlen, + val.mag, ylen, null); + result = trustedStripLeadingZeroInts(result); + return new BigInteger(result, signum*val.signum); + } + else + if ((xlen < TOOM_COOK_THRESHOLD) && (ylen < TOOM_COOK_THRESHOLD)) + return multiplyKaratsuba(this, val); + else + return multiplyToomCook3(this, val); } /** @@ -1229,6 +1286,242 @@ public class BigInteger extends Number i } /** + * Multiplies two BigIntegers using the Karatsuba multiplication + * algorithm. This is a recursive divide-and-conquer algorithm which is + * more efficient for large numbers than what is commonly called the + * "grade-school" algorithm used in multiplyToLen. If the numbers to be + * multiplied have length n, the "grade-school" algorithm has an + * asymptotic complexity of O(n^2). In contrast, the Karatsuba algorithm + * has complexity of O(n^(log2(3))), or O(n^1.585). It achieves this + * increased performance by doing 3 multiplies instead of 4 when + * evaluating the product. As it has some overhead, should be used when + * both numbers are larger than a certain threshold (found + * experimentally). + * + */ + private static BigInteger multiplyKaratsuba(BigInteger x, BigInteger y) + { + int xlen = x.mag.length; + int ylen = y.mag.length; + + // The number of ints in each half of the number. + int half = (Math.max(xlen, ylen)+1) / 2; + + // xl and yl are the lower halves of x and y respectively, + // xh and yh are the upper halves. + BigInteger xl = x.getLower(half); + BigInteger xh = x.getUpper(half); + BigInteger yl = y.getLower(half); + BigInteger yh = y.getUpper(half); + + BigInteger p1 = xh.multiply(yh); // p1 = xh*yh + BigInteger p2 = xl.multiply(yl); // p2 = xl*yl + + // The following is p3=(xh+xl)*(yh+yl) + BigInteger p3 = xh.add(xl).multiply(yh.add(yl)); + + // p1 * 2^(32*2*half) + (p3 - p1 - p2) * 2^(32*half) + p2 + BigInteger result = p1.shiftLeft(32*half).add(p3.subtract(p1).subtract(p2)).shiftLeft(32*half).add(p2); + + if (x.signum * y.signum == -1) + return result.negate(); + else + return result; + } + + /** + * Multiplies two BigIntegers using a 3-way Toom-Cook multiplication + * algorithm. This is a recursive divide-and-conquer algorithm which is + * more efficient for large numbers than what is commonly called the + * "grade-school" algorithm used in multiplyToLen. If the numbers to be + * multiplied have length n, the "grade-school" algorithm has an + * asymptotic complexity of O(n^2). In contrast, 3-way Toom-Cook has a + * complexity of about O(n^1.465). It achieves this increased asymptotic + * performance by breaking each number into three parts and by doing 5 + * multiplies instead of 9 when evaluating the product. Due to overhead + * (additions, shifts, and one division) in the Toom-Cook algorithm, it + * should only be used when both numbers are larger than a certain + * threshold (found experimentally). This threshold is generally larger + * than that for Karatsuba multiplication, so this algorithm is generally + * only used when numbers become significantly larger. + * + * The algorithm used is the "optimal" 3-way Toom-Cook algorithm outlined + * by Marco Bodrato. + * + * See: http://bodrato.it/papers/#WAIFI2007 + * "Towards Optimal Toom-Cook Multiplication for Univariate and + * Multivariate Polynomials in Characteristic 2 and 0." by Marco BODRATO; + * In C.Carlet and B.Sunar, Eds., "WAIFI'07 proceedings", p. 116-133, + * LNCS #4547. Springer, Madrid, Spain, June 21-22, 2007. + * + */ + private static BigInteger multiplyToomCook3(BigInteger a, BigInteger b) + { + int alen = a.mag.length; + int blen = b.mag.length; + + int largest = Math.max(alen, blen); + + // k is the size (in ints) of the lower-order slices. + int k = (largest+2)/3; // Equal to ceil(largest/3) + + // r is the size (in ints) of the highest-order slice. + int r = largest - 2*k; + + // Obtain slices of the numbers. a2 and b2 are the most significant + // bits of the number, and a0 and b0 the least significant. + BigInteger a0, a1, a2, b0, b1, b2; + a2 = a.getToomSlice(k, r, 0, largest); + a1 = a.getToomSlice(k, r, 1, largest); + a0 = a.getToomSlice(k, r, 2, largest); + b2 = b.getToomSlice(k, r, 0, largest); + b1 = b.getToomSlice(k, r, 1, largest); + b0 = b.getToomSlice(k, r, 2, largest); + + BigInteger v0, v1, v2, vm1, vinf, t1, t2, tm1, da1, db1; + + v0 = a0.multiply(b0); + da1 = a2.add(a0); + db1 = b2.add(b0); + v1 = da1.add(a1).multiply(db1.add(b1)); + v2 = a2.shiftLeft(2).add(a1.shiftLeft(1)).add(a0).multiply( + b2.shiftLeft(2).add(b1.shiftLeft(1)).add(b0)); + vm1 = da1.subtract(a1).multiply(db1.subtract(b1)); + vinf = a2.multiply(b2); + + /* The algorithm requires two divisions by 2 and one by 3. + All divisions are known to be exact, that is, they do not produce + remainders, and all results are positive. The divisions by 2 are + implemented as right shifts which are relatively efficient, leaving + only a division by 3. + In the future, the division by 3 could be replaced with an + optimized algorithm for this case. */ + t2 = v2.subtract(vm1).divide(THREE); + tm1 = v1.subtract(vm1).shiftRight(1); + t1 = v1.subtract(v0); + t2 = t2.subtract(t1).shiftRight(1); + t1 = t1.subtract(tm1).subtract(vinf); + t2 = t2.subtract(vinf.shiftLeft(1)); + tm1 = tm1.subtract(t2); + + // Number of bits to shift left. + int ss = k*32; + + BigInteger result = vinf.shiftLeft(ss).add(t2).shiftLeft(ss).add(t1).shiftLeft(ss).add(tm1).shiftLeft(ss).add(v0); + + if (a.signum * b.signum == -1) + return result.negate(); + else + return result; + } + + + /** Returns a slice of a BigInteger for use in Toom-Cook multiplication. + @param lowerSize The size of the lower-order bit slices. + @param upperSize The size of the higher-order bit slices. + @param slice The index of which slice is requested, which must be a + number from 0 to size-1. Slice 0 is the highest-order bits, + and slice size-1 are the lowest-order bits. + Slice 0 may be of different size than the other slices. */ + private BigInteger getToomSlice(int lowerSize, int upperSize, int slice, int fullsize) + { + int start, end, sliceSize, len, offset; + + len = mag.length; + offset = fullsize - len; + + if (slice == 0) + { + start = 0 - offset; + end = upperSize - 1 - offset; + } + else + { + start = upperSize + (slice-1)*lowerSize - offset; + end = start + lowerSize - 1; + } + + if (start < 0) + start = 0; + if (end < 0) + return ZERO; + + sliceSize = (end-start) + 1; + + if (sliceSize <= 0) + return ZERO; + if (start==0 && sliceSize >= len) + return this; + + int intSlice[] = new int[sliceSize]; + System.arraycopy(mag, start, intSlice, 0, sliceSize); + + return new BigInteger(trustedStripLeadingZeroInts(intSlice), 1); + } + + + /** Returns a slice of a BigInteger for use in Karatsuba multiplication. + @param size The approximate size of each slice, in number of ints. + @param numSlices The number of pieces to slice the number into. + @param slice The index of which slice is requested, which must be a + number from 0 to size-1. Slice 0 is the highest-order bits, + and slice size-1 are the lowest-order bits. + Slice 0 may be of different size than the other slices. */ + public BigInteger getSlice(int size, int numSlices, int slice) + { + int len = mag.length; + int end = (len-1) - (numSlices - slice - 1) * size; + int start = end - size + 1; + if (slice == 0 || start < 0) + start = 0; // Expand slice 0 to contain remaining bits + + int slicelen = (end-start) + 1; + if (slicelen <= 0) + return ZERO; + if (start==0 && slicelen >= len) + return this; + + int intSlice[] = new int[slicelen]; + System.arraycopy(mag, start, intSlice, 0, slicelen); + + return new BigInteger(trustedStripLeadingZeroInts(intSlice), 1); + } + + /** + * Returns a new BigInteger representing n lower ints of the number. + * This is used by Karatsuba multiplication and squaring. + */ + private BigInteger getLower(int n) { + int len = mag.length; + + if (len <= n) + return this; + + int lowerInts[] = new int[n]; + System.arraycopy(mag, len-n, lowerInts, 0, n); + + return new BigInteger(trustedStripLeadingZeroInts(lowerInts), 1); + } + + /** + * Returns a new BigInteger representing mag.length-n upper + * ints of the number. This is used by Karatsuba multiplication and + * squaring. + */ + private BigInteger getUpper(int n) { + int len = mag.length; + + if (len <= n) + return ZERO; + + int upperLen = len - n; + int upperInts[] = new int[upperLen]; + System.arraycopy(mag, 0, upperInts, 0, upperLen); + + return new BigInteger(trustedStripLeadingZeroInts(upperInts), 1); + } + + /** * Returns a BigInteger whose value is {@code (this<sup>2</sup>)}. * * @return {@code this<sup>2</sup>} @@ -1236,8 +1529,18 @@ public class BigInteger extends Number i private BigInteger square() { if (signum == 0) return ZERO; - int[] z = squareToLen(mag, mag.length, null); - return new BigInteger(trustedStripLeadingZeroInts(z), 1); + int len = mag.length; + + if (len < KARATSUBA_SQUARE_THRESHOLD) + { + int[] z = squareToLen(mag, len, null); + return new BigInteger(trustedStripLeadingZeroInts(z), 1); + } + else + if (len < TOOM_COOK_SQUARE_THRESHOLD) + return squareKaratsuba(); + else + return squareToomCook3(); } /** @@ -1308,6 +1611,80 @@ public class BigInteger extends Number i } /** + * Squares a BigInteger using the Karatsuba squaring algorithm. It should + * be used when both numbers are larger than a certain threshold (found + * experimentally). It is a recursive divide-and-conquer algorithm that + * has better asymptotic performance than the algorithm used in + * squareToLen. + */ + private BigInteger squareKaratsuba() + { + int half = (mag.length+1) / 2; + + BigInteger xl = getLower(half); + BigInteger xh = getUpper(half); + + BigInteger xhs = xh.square(); // xhs = xh^2 + BigInteger xls = xl.square(); // xls = xl^2 + + // xh^2 << 64 + (((xl+xh)^2 - (xh^2 + xl^2)) << 32) + xl^2 + return xhs.shiftLeft(half*32).add(xl.add(xh).square().subtract(xhs.add(xls))).shiftLeft(half*32).add(xls); + } + + /** + * Squares a BigInteger using the 3-way Toom-Cook squaring algorithm. It + * should be used when both numbers are larger than a certain threshold + * (found experimentally). It is a recursive divide-and-conquer algorithm + * that has better asymptotic performance than the algorithm used in + * squareToLen or squareKaratsuba. + */ + private BigInteger squareToomCook3() + { + int len = mag.length; + + // k is the size (in ints) of the lower-order slices. + int k = (len+2)/3; // Equal to ceil(largest/3) + + // r is the size (in ints) of the highest-order slice. + int r = len - 2*k; + + // Obtain slices of the numbers. a2 is the most significant + // bits of the number, and a0 the least significant. + BigInteger a0, a1, a2; + a2 = getToomSlice(k, r, 0, len); + a1 = getToomSlice(k, r, 1, len); + a0 = getToomSlice(k, r, 2, len); + BigInteger v0, v1, v2, vm1, vinf, t1, t2, tm1, da1; + + v0 = a0.square(); + da1 = a2.add(a0); + v1 = da1.add(a1).square(); + v2 = a2.shiftLeft(2).add(a1.shiftLeft(1)).add(a0).square(); + vm1 = da1.subtract(a1).square(); + vinf = a2.square(); + + /* The algorithm requires two divisions by 2 and one by 3. + All divisions are known to be exact, that is, they do not produce + remainders, and all results are positive. The divisions by 2 are + implemented as right shifts which are relatively efficient, leaving + only a division by 3. + In the future, the division by 3 could be replaced with an + optimized algorithm for this case. */ + t2 = v2.subtract(vm1).divide(THREE); + tm1 = v1.subtract(vm1).shiftRight(1); + t1 = v1.subtract(v0); + t2 = t2.subtract(t1).shiftRight(1); + t1 = t1.subtract(tm1).subtract(vinf); + t2 = t2.subtract(vinf.shiftLeft(1)); + tm1 = tm1.subtract(t2); + + // Number of bits to shift left. + int ss = k*32; + + return vinf.shiftLeft(ss).add(t2).shiftLeft(ss).add(t1).shiftLeft(ss).add(tm1).shiftLeft(ss).add(v0); + } + + /** * Returns a BigInteger whose value is {@code (this / val)}. * * @param val value by which this BigInteger is to be divided.
Attached is an UPDATED patch for bug 4837946, for implementing asymptotically faster algorithms for multiplication of large numbers in the BigInteger class: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4837946 This patch supersedes all previous patches, and contains all of their changes. The difference between this patch and the one posted previously are small, and consist of the following: * Changed a line in getToomSlice() to fix a problem that will affect the future work I'm doing on the pow() function (which I have not yet released.) * Changed threshhold constants from public to private final. These had been made public for tuning of threshholds, and they were accidentally left as public in the last patch. For those who'd rather just replace their BigInteger with my much faster version, that also contains my patches for pow(), it's available at: http://futureboy.us/temp/BigInteger.java -- Alan Eliasen | "Furious activity is no substitute eliasen@mindspring.com | for understanding." http://futureboy.us/ | --H.H. Williams diff --git a/src/share/classes/java/math/BigInteger.java b/src/share/classes/java/math/BigInteger.java --- a/src/share/classes/java/math/BigInteger.java +++ b/src/share/classes/java/math/BigInteger.java @@ -172,6 +172,38 @@ public class BigInteger extends Number i */ private final static long LONG_MASK = 0xffffffffL; + /** + * The threshold value for using Karatsuba multiplication. If the number + * of ints in both mag arrays are greater than this number, then + * Karatsuba multiplication will be used. This value is found + * experimentally to work well. + */ + private static final int KARATSUBA_THRESHOLD = 45; + + /** + * The threshold value for using 3-way Toom-Cook multiplication. + * If the number of ints in both mag arrays are greater than this number, + * then Toom-Cook multiplication will be used. This value is found + * experimentally to work well. + */ + private static final int TOOM_COOK_THRESHOLD = 85; + + /** + * The threshold value for using Karatsuba squaring. If the number + * of ints in the number are larger than this value, + * Karatsuba squaring will be used. This value is found + * experimentally to work well. + */ + private static final int KARATSUBA_SQUARE_THRESHOLD = 100; + + /** + * The threshold value for using Toom-Cook squaring. If the number + * of ints in the number are larger than this value, + * Karatsuba squaring will be used. This value is found + * experimentally to work well. + */ + private static final int TOOM_COOK_SQUARE_THRESHOLD = 190; + //Constructors /** @@ -530,7 +562,8 @@ public class BigInteger extends Number i if (bitLength < 2) throw new ArithmeticException("bitLength < 2"); // The cutoff of 95 was chosen empirically for best performance - prime = (bitLength < 95 ? smallPrime(bitLength, certainty, rnd) + prime = (bitLength < SMALL_PRIME_THRESHOLD + ? smallPrime(bitLength, certainty, rnd) : largePrime(bitLength, certainty, rnd)); signum = 1; mag = prime.mag; @@ -1043,6 +1076,16 @@ public class BigInteger extends Number i private static final BigInteger TWO = valueOf(2); /** + * The BigInteger constant -1. (Not exported.) + */ + private static final BigInteger NEGATIVE_ONE = valueOf(-1); + + /** + * The BigInteger constant 3. (Not exported.) + */ + private static final BigInteger THREE = valueOf(3); + + /** * The BigInteger constant ten. * * @since 1.5 @@ -1188,10 +1234,21 @@ public class BigInteger extends Number i if (val.signum == 0 || signum == 0) return ZERO; - int[] result = multiplyToLen(mag, mag.length, - val.mag, val.mag.length, null); - result = trustedStripLeadingZeroInts(result); - return new BigInteger(result, signum*val.signum); + int xlen = mag.length; + int ylen = val.mag.length; + + if ((xlen < KARATSUBA_THRESHOLD) || (ylen < KARATSUBA_THRESHOLD)) + { + int[] result = multiplyToLen(mag, xlen, + val.mag, ylen, null); + result = trustedStripLeadingZeroInts(result); + return new BigInteger(result, signum*val.signum); + } + else + if ((xlen < TOOM_COOK_THRESHOLD) && (ylen < TOOM_COOK_THRESHOLD)) + return multiplyKaratsuba(this, val); + else + return multiplyToomCook3(this, val); } /** @@ -1229,6 +1286,242 @@ public class BigInteger extends Number i } /** + * Multiplies two BigIntegers using the Karatsuba multiplication + * algorithm. This is a recursive divide-and-conquer algorithm which is + * more efficient for large numbers than what is commonly called the + * "grade-school" algorithm used in multiplyToLen. If the numbers to be + * multiplied have length n, the "grade-school" algorithm has an + * asymptotic complexity of O(n^2). In contrast, the Karatsuba algorithm + * has complexity of O(n^(log2(3))), or O(n^1.585). It achieves this + * increased performance by doing 3 multiplies instead of 4 when + * evaluating the product. As it has some overhead, should be used when + * both numbers are larger than a certain threshold (found + * experimentally). + * + */ + private static BigInteger multiplyKaratsuba(BigInteger x, BigInteger y) + { + int xlen = x.mag.length; + int ylen = y.mag.length; + + // The number of ints in each half of the number. + int half = (Math.max(xlen, ylen)+1) / 2; + + // xl and yl are the lower halves of x and y respectively, + // xh and yh are the upper halves. + BigInteger xl = x.getLower(half); + BigInteger xh = x.getUpper(half); + BigInteger yl = y.getLower(half); + BigInteger yh = y.getUpper(half); + + BigInteger p1 = xh.multiply(yh); // p1 = xh*yh + BigInteger p2 = xl.multiply(yl); // p2 = xl*yl + + // The following is p3=(xh+xl)*(yh+yl) + BigInteger p3 = xh.add(xl).multiply(yh.add(yl)); + + // p1 * 2^(32*2*half) + (p3 - p1 - p2) * 2^(32*half) + p2 + BigInteger result = p1.shiftLeft(32*half).add(p3.subtract(p1).subtract(p2)).shiftLeft(32*half).add(p2); + + if (x.signum * y.signum == -1) + return result.negate(); + else + return result; + } + + /** + * Multiplies two BigIntegers using a 3-way Toom-Cook multiplication + * algorithm. This is a recursive divide-and-conquer algorithm which is + * more efficient for large numbers than what is commonly called the + * "grade-school" algorithm used in multiplyToLen. If the numbers to be + * multiplied have length n, the "grade-school" algorithm has an + * asymptotic complexity of O(n^2). In contrast, 3-way Toom-Cook has a + * complexity of about O(n^1.465). It achieves this increased asymptotic + * performance by breaking each number into three parts and by doing 5 + * multiplies instead of 9 when evaluating the product. Due to overhead + * (additions, shifts, and one division) in the Toom-Cook algorithm, it + * should only be used when both numbers are larger than a certain + * threshold (found experimentally). This threshold is generally larger + * than that for Karatsuba multiplication, so this algorithm is generally + * only used when numbers become significantly larger. + * + * The algorithm used is the "optimal" 3-way Toom-Cook algorithm outlined + * by Marco Bodrato. + * + * See: http://bodrato.it/papers/#WAIFI2007 + * "Towards Optimal Toom-Cook Multiplication for Univariate and + * Multivariate Polynomials in Characteristic 2 and 0." by Marco BODRATO; + * In C.Carlet and B.Sunar, Eds., "WAIFI'07 proceedings", p. 116-133, + * LNCS #4547. Springer, Madrid, Spain, June 21-22, 2007. + * + */ + private static BigInteger multiplyToomCook3(BigInteger a, BigInteger b) + { + int alen = a.mag.length; + int blen = b.mag.length; + + int largest = Math.max(alen, blen); + + // k is the size (in ints) of the lower-order slices. + int k = (largest+2)/3; // Equal to ceil(largest/3) + + // r is the size (in ints) of the highest-order slice. + int r = largest - 2*k; + + // Obtain slices of the numbers. a2 and b2 are the most significant + // bits of the number, and a0 and b0 the least significant. + BigInteger a0, a1, a2, b0, b1, b2; + a2 = a.getToomSlice(k, r, 0, largest); + a1 = a.getToomSlice(k, r, 1, largest); + a0 = a.getToomSlice(k, r, 2, largest); + b2 = b.getToomSlice(k, r, 0, largest); + b1 = b.getToomSlice(k, r, 1, largest); + b0 = b.getToomSlice(k, r, 2, largest); + + BigInteger v0, v1, v2, vm1, vinf, t1, t2, tm1, da1, db1; + + v0 = a0.multiply(b0); + da1 = a2.add(a0); + db1 = b2.add(b0); + v1 = da1.add(a1).multiply(db1.add(b1)); + v2 = a2.shiftLeft(2).add(a1.shiftLeft(1)).add(a0).multiply( + b2.shiftLeft(2).add(b1.shiftLeft(1)).add(b0)); + vm1 = da1.subtract(a1).multiply(db1.subtract(b1)); + vinf = a2.multiply(b2); + + /* The algorithm requires two divisions by 2 and one by 3. + All divisions are known to be exact, that is, they do not produce + remainders, and all results are positive. The divisions by 2 are + implemented as right shifts which are relatively efficient, leaving + only a division by 3. + In the future, the division by 3 could be replaced with an + optimized algorithm for this case. */ + t2 = v2.subtract(vm1).divide(THREE); + tm1 = v1.subtract(vm1).shiftRight(1); + t1 = v1.subtract(v0); + t2 = t2.subtract(t1).shiftRight(1); + t1 = t1.subtract(tm1).subtract(vinf); + t2 = t2.subtract(vinf.shiftLeft(1)); + tm1 = tm1.subtract(t2); + + // Number of bits to shift left. + int ss = k*32; + + BigInteger result = vinf.shiftLeft(ss).add(t2).shiftLeft(ss).add(t1).shiftLeft(ss).add(tm1).shiftLeft(ss).add(v0); + + if (a.signum * b.signum == -1) + return result.negate(); + else + return result; + } + + + /** Returns a slice of a BigInteger for use in Toom-Cook multiplication. + @param lowerSize The size of the lower-order bit slices. + @param upperSize The size of the higher-order bit slices. + @param slice The index of which slice is requested, which must be a + number from 0 to size-1. Slice 0 is the highest-order bits, + and slice size-1 are the lowest-order bits. + Slice 0 may be of different size than the other slices. */ + private BigInteger getToomSlice(int lowerSize, int upperSize, int slice, int fullsize) + { + int start, end, sliceSize, len, offset; + + len = mag.length; + offset = fullsize - len; + + if (slice == 0) + { + start = 0 - offset; + end = upperSize - 1 - offset; + } + else + { + start = upperSize + (slice-1)*lowerSize - offset; + end = start + lowerSize - 1; + } + + if (start < 0) + start = 0; + if (end < 0) + return ZERO; + + sliceSize = (end-start) + 1; + + if (sliceSize <= 0) + return ZERO; + if (start==0 && sliceSize >= len) + return this.abs(); + + int intSlice[] = new int[sliceSize]; + System.arraycopy(mag, start, intSlice, 0, sliceSize); + + return new BigInteger(trustedStripLeadingZeroInts(intSlice), 1); + } + + + /** Returns a slice of a BigInteger for use in Karatsuba multiplication. + @param size The approximate size of each slice, in number of ints. + @param numSlices The number of pieces to slice the number into. + @param slice The index of which slice is requested, which must be a + number from 0 to size-1. Slice 0 is the highest-order bits, + and slice size-1 are the lowest-order bits. + Slice 0 may be of different size than the other slices. */ + public BigInteger getSlice(int size, int numSlices, int slice) + { + int len = mag.length; + int end = (len-1) - (numSlices - slice - 1) * size; + int start = end - size + 1; + if (slice == 0 || start < 0) + start = 0; // Expand slice 0 to contain remaining bits + + int slicelen = (end-start) + 1; + if (slicelen <= 0) + return ZERO; + if (start==0 && slicelen >= len) + return this; + + int intSlice[] = new int[slicelen]; + System.arraycopy(mag, start, intSlice, 0, slicelen); + + return new BigInteger(trustedStripLeadingZeroInts(intSlice), 1); + } + + /** + * Returns a new BigInteger representing n lower ints of the number. + * This is used by Karatsuba multiplication and squaring. + */ + private BigInteger getLower(int n) { + int len = mag.length; + + if (len <= n) + return this; + + int lowerInts[] = new int[n]; + System.arraycopy(mag, len-n, lowerInts, 0, n); + + return new BigInteger(trustedStripLeadingZeroInts(lowerInts), 1); + } + + /** + * Returns a new BigInteger representing mag.length-n upper + * ints of the number. This is used by Karatsuba multiplication and + * squaring. + */ + private BigInteger getUpper(int n) { + int len = mag.length; + + if (len <= n) + return ZERO; + + int upperLen = len - n; + int upperInts[] = new int[upperLen]; + System.arraycopy(mag, 0, upperInts, 0, upperLen); + + return new BigInteger(trustedStripLeadingZeroInts(upperInts), 1); + } + + /** * Returns a BigInteger whose value is {@code (this<sup>2</sup>)}. * * @return {@code this<sup>2</sup>} @@ -1236,8 +1529,18 @@ public class BigInteger extends Number i private BigInteger square() { if (signum == 0) return ZERO; - int[] z = squareToLen(mag, mag.length, null); - return new BigInteger(trustedStripLeadingZeroInts(z), 1); + int len = mag.length; + + if (len < KARATSUBA_SQUARE_THRESHOLD) + { + int[] z = squareToLen(mag, len, null); + return new BigInteger(trustedStripLeadingZeroInts(z), 1); + } + else + if (len < TOOM_COOK_SQUARE_THRESHOLD) + return squareKaratsuba(); + else + return squareToomCook3(); } /** @@ -1308,6 +1611,80 @@ public class BigInteger extends Number i } /** + * Squares a BigInteger using the Karatsuba squaring algorithm. It should + * be used when both numbers are larger than a certain threshold (found + * experimentally). It is a recursive divide-and-conquer algorithm that + * has better asymptotic performance than the algorithm used in + * squareToLen. + */ + private BigInteger squareKaratsuba() + { + int half = (mag.length+1) / 2; + + BigInteger xl = getLower(half); + BigInteger xh = getUpper(half); + + BigInteger xhs = xh.square(); // xhs = xh^2 + BigInteger xls = xl.square(); // xls = xl^2 + + // xh^2 << 64 + (((xl+xh)^2 - (xh^2 + xl^2)) << 32) + xl^2 + return xhs.shiftLeft(half*32).add(xl.add(xh).square().subtract(xhs.add(xls))).shiftLeft(half*32).add(xls); + } + + /** + * Squares a BigInteger using the 3-way Toom-Cook squaring algorithm. It + * should be used when both numbers are larger than a certain threshold + * (found experimentally). It is a recursive divide-and-conquer algorithm + * that has better asymptotic performance than the algorithm used in + * squareToLen or squareKaratsuba. + */ + private BigInteger squareToomCook3() + { + int len = mag.length; + + // k is the size (in ints) of the lower-order slices. + int k = (len+2)/3; // Equal to ceil(largest/3) + + // r is the size (in ints) of the highest-order slice. + int r = len - 2*k; + + // Obtain slices of the numbers. a2 is the most significant + // bits of the number, and a0 the least significant. + BigInteger a0, a1, a2; + a2 = getToomSlice(k, r, 0, len); + a1 = getToomSlice(k, r, 1, len); + a0 = getToomSlice(k, r, 2, len); + BigInteger v0, v1, v2, vm1, vinf, t1, t2, tm1, da1; + + v0 = a0.square(); + da1 = a2.add(a0); + v1 = da1.add(a1).square(); + v2 = a2.shiftLeft(2).add(a1.shiftLeft(1)).add(a0).square(); + vm1 = da1.subtract(a1).square(); + vinf = a2.square(); + + /* The algorithm requires two divisions by 2 and one by 3. + All divisions are known to be exact, that is, they do not produce + remainders, and all results are positive. The divisions by 2 are + implemented as right shifts which are relatively efficient, leaving + only a division by 3. + In the future, the division by 3 could be replaced with an + optimized algorithm for this case. */ + t2 = v2.subtract(vm1).divide(THREE); + tm1 = v1.subtract(vm1).shiftRight(1); + t1 = v1.subtract(v0); + t2 = t2.subtract(t1).shiftRight(1); + t1 = t1.subtract(tm1).subtract(vinf); + t2 = t2.subtract(vinf.shiftLeft(1)); + tm1 = tm1.subtract(t2); + + // Number of bits to shift left. + int ss = k*32; + + return vinf.shiftLeft(ss).add(t2).shiftLeft(ss).add(t1).shiftLeft(ss).add(tm1).shiftLeft(ss).add(v0); + } + + /** * Returns a BigInteger whose value is {@code (this / val)}. * * @param val value by which this BigInteger is to be divided.
participants (3)
-
Alan Eliasen
-
Florian Weimer
-
Joseph D. Darcy