RFR: JDK-8277175 : Add a parallel multiply method to BigInteger
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel. The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number. We have also parallelized the private square() method. Internally, the square() method defaults to be sequential. Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op ------------- Commit messages: - Update comments - Added parallelMultiply() method to BigInteger to allow large multiplications to run in parallel - Merge remote-tracking branch 'upstream/master' - Merge remote-tracking branch 'upstream/master' - Merge remote-tracking branch 'origin/master' - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box Changes: https://git.openjdk.java.net/jdk/pull/6391/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6391&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277175 Stats: 731 lines in 8 files changed: 565 ins; 136 del; 30 mod Patch: https://git.openjdk.java.net/jdk/pull/6391.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6391/head:pull/6391 PR: https://git.openjdk.java.net/jdk/pull/6391
On Mon, 15 Nov 2021 15:50:39 GMT, kabutz <duke@openjdk.java.net> wrote:
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
Some more benchmark results, run on my 1-6-2 server: Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 51.707 ± 11.194 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 988.302 ± 235.977 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 24662.063 ± 1123.329 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 49.337 ± 26.611 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 527.560 ± 268.903 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9076.551 ± 1899.444 ms/op We can see that for larger calculations (fib 100m), the execution is 2.7x faster in parallel. For medium size (fib 10m) it is 1.873x faster. And for small (fib 1m) it is roughly the same. Considering that the fibonacci algorithm that we used was in itself sequential, and that the last 3 calculations would dominate, 2.7x faster should probably be considered quite good on a 1-6-2 machine. ------------- PR: https://git.openjdk.java.net/jdk/pull/6391
On Mon, 15 Nov 2021 15:50:39 GMT, kabutz <duke@openjdk.java.net> wrote:
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
This PR mistakenly includes commits from another in-progress PR. You need to remove these stray commits or else close this PR, create a new branch without them, and resubmit. ------------- Changes requested by kcr (Author). PR: https://git.openjdk.java.net/jdk/pull/6391
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
kabutz has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'openjdk:master' into parallel-bigInteger - Update comments - Added parallelMultiply() method to BigInteger to allow large multiplications to run in parallel - Merge remote-tracking branch 'upstream/master' - Merge remote-tracking branch 'upstream/master' - Merge remote-tracking branch 'origin/master' - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box Addressing some of Laurent's code review recommendations/comments: 1. use the convention t for the parametric variable x(t),y(t) 2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like Helpers.quadraticRoots() 3. always use braces for x = (a < b) ? ... 4. always use double-precision constants in math or logical operations: (2 * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0) (There are two additional recommendations not in this commit that I'll ask about shortly.) See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954 - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box The bug writeup indicated they wanted Path2D#getBounds2D() to be more accurate/concise. They didn't explicitly say they wanted CubicCurve2D and QuadCurve2D to become more accurate too. But a preexisting unit test failed when Path2D#getBounds2D() was updated and those other classes weren't. At this point I considered either: A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate getBounds2D() or B. Updating the unit test to forgive the discrepancy. I chose A. Which might technically be seen as scope creep, but it feels like a more holistic/better approach. This also includes a new unit test (in Path2D/UnitTest.java) that fails without the changes in this commit. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6391/files - new: https://git.openjdk.java.net/jdk/pull/6391/files/eac09f15..03be5518 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6391&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6391&range=00-01 Stats: 24162 lines in 218 files changed: 17701 ins; 2440 del; 4021 mod Patch: https://git.openjdk.java.net/jdk/pull/6391.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6391/head:pull/6391 PR: https://git.openjdk.java.net/jdk/pull/6391
On Mon, 15 Nov 2021 15:50:39 GMT, kabutz <duke@openjdk.java.net> wrote:
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
Additionally, you should probably discuss new functionality such as this on the appropriate mailing list, `core-libs-dev at openjdk.java.net` in this case. Finally, this will need a CSR if it is to proceed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6391
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
kabutz has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - 8277119: Add asserts in GenericTaskQueueSet methods Reviewed-by: tschatzl - 8274757: Cleanup unnecessary calls to Throwable.initCause() in java.management module Reviewed-by: dfuchs, sspitsyn - 8274662: Replace 'while' cycles with iterator with enhanced-for in jdk.hotspot.agent Reviewed-by: amenkov, cjplummer, sspitsyn - 8274163: Use String.equals instead of String.compareTo in jdk.jcmd Reviewed-by: cjplummer, amenkov, sspitsyn - 8274190: Use String.equals instead of String.compareTo in jdk.internal.jvmstat Reviewed-by: cjplummer, sspitsyn - 8274168: Avoid String.compareTo == 0 to check String equality in java.management Reviewed-by: sspitsyn, dfuchs, cjplummer, dholmes - 8274261: Use enhanced-for instead of plain 'for' in jdk.jcmd Reviewed-by: sspitsyn, cjplummer - Update comments - Added parallelMultiply() method to BigInteger to allow large multiplications to run in parallel - Merge branch 'openjdk:master' into master - ... and 5 more: https://git.openjdk.java.net/jdk/compare/744e9e8e...0075f04d ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6391/files - new: https://git.openjdk.java.net/jdk/pull/6391/files/03be5518..0075f04d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6391&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6391&range=01-02 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6391.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6391/head:pull/6391 PR: https://git.openjdk.java.net/jdk/pull/6391
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
kabutz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Update comments - Added parallelMultiply() method to BigInteger to allow large multiplications to run in parallel - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box Addressing some of Laurent's code review recommendations/comments: 1. use the convention t for the parametric variable x(t),y(t) 2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like Helpers.quadraticRoots() 3. always use braces for x = (a < b) ? ... 4. always use double-precision constants in math or logical operations: (2 * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0) (There are two additional recommendations not in this commit that I'll ask about shortly.) See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954 - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box The bug writeup indicated they wanted Path2D#getBounds2D() to be more accurate/concise. They didn't explicitly say they wanted CubicCurve2D and QuadCurve2D to become more accurate too. But a preexisting unit test failed when Path2D#getBounds2D() was updated and those other classes weren't. At this point I considered either: A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate getBounds2D() or B. Updating the unit test to forgive the discrepancy. I chose A. Which might technically be seen as scope creep, but it feels like a more holistic/better approach. This also includes a new unit test (in Path2D/UnitTest.java) that fails without the changes in this commit. ------------- Changes: https://git.openjdk.java.net/jdk/pull/6391/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6391&range=03 Stats: 731 lines in 8 files changed: 565 ins; 136 del; 30 mod Patch: https://git.openjdk.java.net/jdk/pull/6391.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6391/head:pull/6391 PR: https://git.openjdk.java.net/jdk/pull/6391
On Tue, 16 Nov 2021 12:48:03 GMT, kabutz <duke@openjdk.java.net> wrote:
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
kabutz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
- Update comments - Added parallelMultiply() method to BigInteger to allow large multiplications to run in parallel - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box
Addressing some of Laurent's code review recommendations/comments:
1. use the convention t for the parametric variable x(t),y(t) 2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like Helpers.quadraticRoots() 3. always use braces for x = (a < b) ? ... 4. always use double-precision constants in math or logical operations: (2 * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0)
(There are two additional recommendations not in this commit that I'll ask about shortly.)
See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954 - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box
The bug writeup indicated they wanted Path2D#getBounds2D() to be more accurate/concise. They didn't explicitly say they wanted CubicCurve2D and QuadCurve2D to become more accurate too. But a preexisting unit test failed when Path2D#getBounds2D() was updated and those other classes weren't. At this point I considered either: A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate getBounds2D() or B. Updating the unit test to forgive the discrepancy.
I chose A. Which might technically be seen as scope creep, but it feels like a more holistic/better approach.
This also includes a new unit test (in Path2D/UnitTest.java) that fails without the changes in this commit.
You still have the extra unwanted commits in your branch. Normally you should not force push to your branch once a review is started, but in this case the only other choice would be to abandon this PR and create a new one. Regardless of whether you close this PR and create a new one or continue with this PR, you should rebase your branch on top of the latest jdk master such that there is only a single commit with the changes you are proposing. It should _not_ touch any files from `java.desktop`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6391
On Tue, 16 Nov 2021 12:48:03 GMT, kabutz <duke@openjdk.java.net> wrote:
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
kabutz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
- Update comments - Added parallelMultiply() method to BigInteger to allow large multiplications to run in parallel - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box
Addressing some of Laurent's code review recommendations/comments:
1. use the convention t for the parametric variable x(t),y(t) 2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like Helpers.quadraticRoots() 3. always use braces for x = (a < b) ? ... 4. always use double-precision constants in math or logical operations: (2 * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0)
(There are two additional recommendations not in this commit that I'll ask about shortly.)
See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954 - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box
The bug writeup indicated they wanted Path2D#getBounds2D() to be more accurate/concise. They didn't explicitly say they wanted CubicCurve2D and QuadCurve2D to become more accurate too. But a preexisting unit test failed when Path2D#getBounds2D() was updated and those other classes weren't. At this point I considered either: A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate getBounds2D() or B. Updating the unit test to forgive the discrepancy.
I chose A. Which might technically be seen as scope creep, but it feels like a more holistic/better approach.
This also includes a new unit test (in Path2D/UnitTest.java) that fails without the changes in this commit.
Thanks Kevin, let me start again! ------------- PR: https://git.openjdk.java.net/jdk/pull/6391
On Tue, 16 Nov 2021 12:48:03 GMT, kabutz <duke@openjdk.java.net> wrote:
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
kabutz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
- Update comments - Added parallelMultiply() method to BigInteger to allow large multiplications to run in parallel - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box
Addressing some of Laurent's code review recommendations/comments:
1. use the convention t for the parametric variable x(t),y(t) 2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like Helpers.quadraticRoots() 3. always use braces for x = (a < b) ? ... 4. always use double-precision constants in math or logical operations: (2 * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0)
(There are two additional recommendations not in this commit that I'll ask about shortly.)
See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954 - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box
The bug writeup indicated they wanted Path2D#getBounds2D() to be more accurate/concise. They didn't explicitly say they wanted CubicCurve2D and QuadCurve2D to become more accurate too. But a preexisting unit test failed when Path2D#getBounds2D() was updated and those other classes weren't. At this point I considered either: A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate getBounds2D() or B. Updating the unit test to forgive the discrepancy.
I chose A. Which might technically be seen as scope creep, but it feels like a more holistic/better approach.
This also includes a new unit test (in Path2D/UnitTest.java) that fails without the changes in this commit.
Resubmitted as PR https://github.com/openjdk/jdk/pull/6409/ ------------- PR: https://git.openjdk.java.net/jdk/pull/6391
On Mon, 15 Nov 2021 15:50:39 GMT, kabutz <duke@openjdk.java.net> wrote:
BigInteger currently uses three different algorithms for multiply. The simple quadratic algorithm, then the slightly better Karatsuba if we exceed a bit count and then Toom Cook 3 once we go into the several thousands of bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have demonstrated this several times in conference talks. In order to be consistent with other classes such as Arrays and Collection, I have added a parallelMultiply() method. Internally we have added a parameter to the private multiply method to indicate whether the calculation should be done in parallel.
The performance improvements are as should be expected. Fibonacci of 100 million (using a single-threaded Dijkstra's sum of squares version) completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential multiply() method. This is on my 1-8-2 laptop. The final multiplications are with very large numbers, which then benefit from the parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
We have also parallelized the private square() method. Internally, the square() method defaults to be sequential.
Benchmark (n) Mode Cnt Score Error Units BigIntegerParallelMultiply.multiply 1000000 ss 4 68,043 ± 25,317 ms/op BigIntegerParallelMultiply.multiply 10000000 ss 4 1073,095 ± 125,296 ms/op BigIntegerParallelMultiply.multiply 100000000 ss 4 25317,535 ± 5806,205 ms/op BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 56,552 ± 22,368 ms/op BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 536,193 ± 37,393 ms/op BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9274,657 ± 826,197 ms/op
This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6391
participants (2)
-
kabutz
-
Kevin Rushforth