RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic [v2]

Andrew Haley aph at openjdk.java.net
Tue Oct 27 16:48:23 UTC 2020


On Tue, 27 Oct 2020 06:32:29 GMT, Dong Bo <dongbo at openjdk.org> wrote:

>> BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not intrinsified on aarch64, which have been done on x86_64.
>> We can implement them via USHL NEON instruction (register), which handles four integers one time at most, against just integer C2 asm-code processed.
>> The usage of USHL can be found at: https://developer.arm.com/documentation/dui0801/g/A64-SIMD-Vector-Instructions/USHL--vector-?lang=en
>> 
>> Patch passed jtreg tier1-3 tests on our aarch64 server.
>> Tests in test/jdk/java/math/BigInteger/* runned specially for the correctness of the implementation and passed.
>> 
>> We tested test/micro/org/openjdk/bench/java/math/BigIntegers.java for performance gain on Kunpeng916 and Kunpeng920.
>> The following performance improvements were seen with this implementation:
>> - Intrinsification of BigInteger.shiftLeft: 25.52% (Kunpeng916), 37.56% (Kunpeng920)
>> - Intrinsification of BigInteger.shiftRight: 46.45% (Kunpeng916), 43.32% (Kunpeng920)
>> 
>> The BigIntegers.java JMH micro-benchmark results:
>> Benchmark                      Mode  Cnt     Score    Error  Units
>> 
>> # Kunpeng 916, default
>> BigIntegers.testAdd            avgt   25    33.554 ±  0.224  ns/op
>> BigIntegers.testHugeToString   avgt   25   575.554 ± 40.656  ns/op
>> BigIntegers.testLargeToString  avgt   25   190.098 ±  0.825  ns/op
>> **BigIntegers.testLeftShift      avgt   25  1495.779 ± 12.365  ns/op**
>> BigIntegers.testMultiply       avgt   25  7551.707 ± 39.309  ns/op
>> **BigIntegers.testRightShift     avgt   25   605.302 ±  6.710  ns/op**
>> BigIntegers.testSmallToString  avgt   25   179.034 ±  0.873  ns/op
>> 
>> # Kunpeng 916, intrinsic:
>> BigIntegers.testAdd            avgt   25    33.531 ±  0.222  ns/op
>> BigIntegers.testHugeToString   avgt   25   578.038 ± 40.675  ns/op
>> BigIntegers.testLargeToString  avgt   25   188.566 ±  0.855  ns/op
>> **BigIntegers.testLeftShift      avgt   25  1191.651 ± 20.136  ns/op**
>> BigIntegers.testMultiply       avgt   25  7492.711 ±  3.702  ns/op
>> **BigIntegers.testRightShift     avgt   25   326.891 ±  6.033  ns/op**
>> BigIntegers.testSmallToString  avgt   25   178.267 ±  1.501  ns/op
>> 
>> # Kunpeng 920, default
>> BigIntegers.testAdd            avgt   25    22.790 ±  0.167  ns/op
>> BigIntegers.testHugeToString   avgt   25   432.428 ± 10.736  ns/op
>> BigIntegers.testLargeToString  avgt   25   121.899 ±  3.356  ns/op
>> **BigIntegers.testLeftShift      avgt   25   883.530 ± 53.714  ns/op**
>> BigIntegers.testMultiply       avgt   25  5918.845 ± 94.937  ns/op
>> **BigIntegers.testRightShift     avgt   25   329.762 ± 15.850  ns/op**
>> BigIntegers.testSmallToString  avgt   25   117.460 ±  3.040  ns/op
>> 
>> # Kunpeng 920, intrinsic
>> BigIntegers.testAdd            avgt   25    21.791 ±  0.085  ns/op
>> BigIntegers.testHugeToString   avgt   25   415.209 ± 32.170  ns/op
>> BigIntegers.testLargeToString  avgt   25   124.635 ±  2.157  ns/op
>> **BigIntegers.testLeftShift      avgt   25   551.710 ±  7.836  ns/op**
>> BigIntegers.testMultiply       avgt   25  5869.401 ± 54.803  ns/op
>> **BigIntegers.testRightShift     avgt   25   186.896 ±  6.378  ns/op**
>> BigIntegers.testSmallToString  avgt   25   117.543 ±  3.036  ns/op
>
> Dong Bo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   minor improvements for small BigIntegers

Marked as reviewed by aph (Reviewer).

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4167:

> 4165:     __ strw(r12,  __ post(newArr, 4));
> 4166:     __ sub(numIter, numIter, 1);
> 4167:     __ cbz(numIter, Exit);

This is odd code. Why not `cbnz(numIter, ShiftOneLoop)` ?

-------------

PR: https://git.openjdk.java.net/jdk/pull/861


More information about the core-libs-dev mailing list