RFR: 8302204: Optimize BigDecimal.divide

Tue Feb 14 04:48:43 UTC 2023

On Fri, 10 Feb 2023 10:00:05 GMT, Xiaowei Lu <duke at openjdk.org> wrote:

> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the poor performance of BigDecimal.divide under certain circumstance.
> 
> We confront similar situations when benchmarking Spark3 on TPC-DS test kit. According to the flame-graph below, it is StripZeros that spends most of the time of BigDecimal.divide. Hence we propose this patch to optimize stripping zeros.
> ![图片 1](https://user-images.githubusercontent.com/39413832/218062061-53cd0220-776e-4b72-8b9a-6b0f11707986.png)
> 
> Currently, createAndStripZerosToMatchScale() is performed linearly. That is, the target value is parsed from back to front, each time stripping out single ‘0’. To optimize, we can adopt the method of binary search. That is, each time we try to strip out ${scale/2} ‘0’s. 
> 
> The performance looks good. Therotically, time complexity of our method is O(log n), while the current one is O(n). In practice, benchmarks on Spark3 show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs Jtreg and JCK to check correctness, and it seems fine.
> 
> More about environment: 
> we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much impact on BigDecimal. Spark cluster consists of a main node and 2 core nodes, each has 4cores, 16g memory and 4x500GB storage.

> _Mailing list message from [Louis Wasserman](mailto:lowasser at google.com) on [core-libs-dev](mailto:core-libs-dev at mail.openjdk.org):_
> 
> Could you do that benchmark with e.g. JMH rather than taking the difference of System.currentTimeMillis? That would probably make it easier to read and trust the results.
> 
> On Sun, Feb 12, 2023, 7:56 PM Sergey Kuksenko <skuksenko at openjdk.org> wrote:
> 
> -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230212/a0de161d/attachment.htm>

Sure. I have written a microbenchmark using JMH and it shows

before optimization 
DecimalBenchmark.divide_by_2         thrpt    5   334549.891 ±  3028.946  ops/s
DecimalBenchmark.divide_by_2_to_100  thrpt    5    15874.537 ±    38.942  ops/s
DecimalBenchmark.divide_by_3         thrpt    5  6190583.950 ± 91116.750  ops/s

after optimization 
DecimalBenchmark.divide_by_2         thrpt    5  1479106.480 ±  5950.440  ops/s
DecimalBenchmark.divide_by_2_to_100  thrpt    5    32730.634 ±    81.891  ops/s
DecimalBenchmark.divide_by_3         thrpt    5  6233700.252 ± 21043.571  ops/s

The following is the benchmark code

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 5, time = 10)
@Measurement(iterations = 5, time = 10)
@Fork(value = 1)
public class DecimalBenchmark {
    private static final MathContext mc = new MathContext(38, RoundingMode.HALF_UP);

    @Benchmark
    public void divide_by_2_to_100() {
        for(long i = 2; i <= 100; i++) {
            BigDecimal divisor = BigDecimal.valueOf(i);
            BigDecimal.ONE.divide(divisor, mc);
        }
    }

    @Benchmark
    public void divide_by_2() {
        BigDecimal.ONE.divide(BigDecimal.valueOf(2L), mc);
    }

    @Benchmark
    public void divide_by_3() {
        BigDecimal.ONE.divide(BigDecimal.valueOf(3L), mc);
    }
}

-------------

PR: https://git.openjdk.org/jdk/pull/12509