Integrated: 8285040: PPC64 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long

Martin Doerr mdoerr at openjdk.java.net
Thu Apr 21 16:03:30 UTC 2022


On Tue, 19 Apr 2022 19:29:34 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

> Add match rules for UDivI, UModI, UDivL, UModL as on x86 (JDK-8282221). PPC64 doesn't have DivMod instructions which can deliver both results at once.
> Note: The x86 tests can currently not be extended to this platform because https://bugs.openjdk.java.net/browse/JDK-8280120 is not yet implemented.
> 
> (Removed UDivI, UModI again in second commit, because performance was worse. C2 can optimize better without intrinsification as long as we don't have UseDivMod optimization. Added back later with 4th commit.)
> 
> IntegerDivMod without UDivI, UModI on Power9:
> 
> Benchmark                                  (BUFFER_SIZE)  (divisorType)  Mode  Cnt     Score   Error  Units
> IntegerDivMod.testDivideRemainderUnsigned           1024          mixed  avgt   25  2386.064 ± 2.746  ns/op
> IntegerDivMod.testDivideRemainderUnsigned           1024       positive  avgt   25  2385.697 ± 2.831  ns/op
> IntegerDivMod.testDivideRemainderUnsigned           1024       negative  avgt   25  2386.021 ± 2.756  ns/op
> IntegerDivMod.testDivideUnsigned                    1024          mixed  avgt   25  1788.233 ± 5.612  ns/op
> IntegerDivMod.testDivideUnsigned                    1024       positive  avgt   25  1785.991 ± 7.001  ns/op
> IntegerDivMod.testDivideUnsigned                    1024       negative  avgt   25  1789.000 ± 6.258  ns/op
> IntegerDivMod.testRemainderUnsigned                 1024          mixed  avgt   25  2084.063 ± 2.618  ns/op
> IntegerDivMod.testRemainderUnsigned                 1024       positive  avgt   25  2080.573 ± 5.779  ns/op
> IntegerDivMod.testRemainderUnsigned                 1024       negative  avgt   25  2083.192 ± 2.111  ns/op
> 
> 
> LongDivMod without UDivL, UModL on Power9:
> 
> Benchmark                               (BUFFER_SIZE)  (divisorType)  Mode  Cnt     Score     Error  Units
> LongDivMod.testDivideRemainderUnsigned           1024          mixed  avgt   25  5482.364 ±  18.448  ns/op
> LongDivMod.testDivideRemainderUnsigned           1024       positive  avgt   25  4722.370 ±   2.314  ns/op
> LongDivMod.testDivideRemainderUnsigned           1024       negative  avgt   25  2024.052 ±   0.604  ns/op
> LongDivMod.testDivideUnsigned                    1024          mixed  avgt   25  4772.528 ±  63.147  ns/op
> LongDivMod.testDivideUnsigned                    1024       positive  avgt   25  3711.178 ±   1.178  ns/op
> LongDivMod.testDivideUnsigned                    1024       negative  avgt   25  1195.149 ±   0.822  ns/op
> LongDivMod.testRemainderUnsigned                 1024          mixed  avgt   25  4753.722 ± 115.171  ns/op
> LongDivMod.testRemainderUnsigned                 1024       positive  avgt   25  3749.799 ±   5.935  ns/op
> LongDivMod.testRemainderUnsigned                 1024       negative  avgt   25  1488.802 ±   0.628  ns/op
> 
> 
> With UDivL, UModL:
> 
> Benchmark                               (BUFFER_SIZE)  (divisorType)  Mode  Cnt     Score   Error  Units
> LongDivMod.testDivideRemainderUnsigned           1024          mixed  avgt   25  3253.162 ± 1.019  ns/op
> LongDivMod.testDivideRemainderUnsigned           1024       positive  avgt   25  3252.280 ± 1.608  ns/op
> LongDivMod.testDivideRemainderUnsigned           1024       negative  avgt   25  3252.933 ± 1.850  ns/op
> LongDivMod.testDivideUnsigned                    1024          mixed  avgt   25  1648.233 ± 1.830  ns/op
> LongDivMod.testDivideUnsigned                    1024       positive  avgt   25  1648.639 ± 0.816  ns/op
> LongDivMod.testDivideUnsigned                    1024       negative  avgt   25  1646.247 ± 3.835  ns/op
> LongDivMod.testRemainderUnsigned                 1024          mixed  avgt   25  1766.701 ± 1.897  ns/op
> LongDivMod.testRemainderUnsigned                 1024       positive  avgt   25  1767.413 ± 1.450  ns/op
> LongDivMod.testRemainderUnsigned                 1024       negative  avgt   25  1767.216 ± 1.800  ns/op
> 
> 
> It turns out that the "UseDivMod" optimization is key for this benchmark. Implemented with 3rd commit.
> With UDivL, UModL and UseDivMod optimization:
> 
> Benchmark                               (BUFFER_SIZE)  (divisorType)  Mode  Cnt     Score   Error  Units
> LongDivMod.testDivideRemainderUnsigned           1024          mixed  avgt   25  1848.883 ± 3.550  ns/op
> LongDivMod.testDivideRemainderUnsigned           1024       positive  avgt   25  1849.743 ± 1.309  ns/op
> LongDivMod.testDivideRemainderUnsigned           1024       negative  avgt   25  1848.598 ± 2.436  ns/op
> LongDivMod.testDivideUnsigned                    1024          mixed  avgt   25  1646.810 ± 4.024  ns/op
> LongDivMod.testDivideUnsigned                    1024       positive  avgt   25  1648.605 ± 1.157  ns/op
> LongDivMod.testDivideUnsigned                    1024       negative  avgt   25  1648.319 ± 1.285  ns/op
> LongDivMod.testRemainderUnsigned                 1024          mixed  avgt   25  1766.375 ± 1.559  ns/op
> LongDivMod.testRemainderUnsigned                 1024       positive  avgt   25  1765.909 ± 1.815  ns/op
> LongDivMod.testRemainderUnsigned                 1024       negative  avgt   25  1766.459 ± 1.255  ns/op
> 
> 
> Integer version shows basically the same performance, now:
> IntegerDivMod with UDivI, UModI and UseDivMod optimization:
> 
> Benchmark                                  (BUFFER_SIZE)  (divisorType)  Mode  Cnt     Score   Error  Units
> IntegerDivMod.testDivideRemainderUnsigned           1024          mixed  avgt   25  1855.158 ± 2.161  ns/op
> IntegerDivMod.testDivideRemainderUnsigned           1024       positive  avgt   25  1857.348 ± 1.569  ns/op
> IntegerDivMod.testDivideRemainderUnsigned           1024       negative  avgt   25  1856.095 ± 2.129  ns/op
> IntegerDivMod.testDivideUnsigned                    1024          mixed  avgt   25  1648.743 ± 0.819  ns/op
> IntegerDivMod.testDivideUnsigned                    1024       positive  avgt   25  1647.971 ± 1.731  ns/op
> IntegerDivMod.testDivideUnsigned                    1024       negative  avgt   25  1648.994 ± 0.861  ns/op
> IntegerDivMod.testRemainderUnsigned                 1024          mixed  avgt   25  1777.920 ± 3.967  ns/op
> IntegerDivMod.testRemainderUnsigned                 1024       positive  avgt   25  1776.796 ± 5.479  ns/op
> IntegerDivMod.testRemainderUnsigned                 1024       negative  avgt   25  1778.992 ± 3.611  ns/op

This pull request has now been integrated.

Changeset: e955cacb
Author:    Martin Doerr <mdoerr at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/e955cacb91420704de3c72861b3d559696dfd07b
Stats:     59 lines in 4 files changed: 59 ins; 0 del; 0 mod

8285040: PPC64 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long

Reviewed-by: kvn, lucy

-------------

PR: https://git.openjdk.java.net/jdk/pull/8304


More information about the hotspot-compiler-dev mailing list