RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4]

Andrew Haley aph at openjdk.java.net
Tue Nov 3 10:22:02 UTC 2020


On Tue, 3 Nov 2020 07:01:12 GMT, Dong Bo <dongbo at openjdk.org> wrote:

>> Base64.encodeBlock stub is implemented for x86_64. 
>> We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL.
>> A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
>> 
>> Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build.
>> Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed.
>> 
>> A JMH micro, Base64Encode.java, is added for performance test.
>> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
>> we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920.
>> 
>> The Base64Encode.java JMH micro-benchmark results:
>> Benchmark                      (maxNumBytes)  Mode  Cnt      Score   Error  Units
>> # kunpeng 916, intrinsic
>> Base64Encode.testBase64Encode              1  avgt   10    31.564 ± 0.034  ns/op
>> Base64Encode.testBase64Encode              2  avgt   10    33.921 ± 0.362  ns/op
>> Base64Encode.testBase64Encode              3  avgt   10    38.015 ± 0.220  ns/op
>> Base64Encode.testBase64Encode              6  avgt   10    41.115 ± 0.281  ns/op
>> Base64Encode.testBase64Encode              7  avgt   10    42.161 ± 0.630  ns/op
>> Base64Encode.testBase64Encode              9  avgt   10    44.797 ± 0.849  ns/op
>> Base64Encode.testBase64Encode             10  avgt   10    46.013 ± 0.917  ns/op
>> Base64Encode.testBase64Encode             48  avgt   10    67.984 ± 0.777  ns/op
>> Base64Encode.testBase64Encode            512  avgt   10   174.494 ± 1.614  ns/op
>> Base64Encode.testBase64Encode           1000  avgt   10   277.103 ± 0.306  ns/op
>> Base64Encode.testBase64Encode          20000  avgt   10  4261.018 ± 1.883  ns/op
>> 
>> # kunpeng 916, default
>> Base64Encode.testBase64Encode              1  avgt   10     31.710 ± 0.234  ns/op
>> Base64Encode.testBase64Encode              2  avgt   10     33.978 ± 0.305  ns/op
>> Base64Encode.testBase64Encode              3  avgt   10     40.059 ± 0.444  ns/op
>> Base64Encode.testBase64Encode              6  avgt   10     47.958 ± 0.328  ns/op
>> Base64Encode.testBase64Encode              7  avgt   10     49.017 ± 1.305  ns/op
>> Base64Encode.testBase64Encode              9  avgt   10     53.150 ± 0.769  ns/op
>> Base64Encode.testBase64Encode             10  avgt   10     55.418 ± 0.316  ns/op
>> Base64Encode.testBase64Encode             48  avgt   10     93.517 ± 0.391  ns/op
>> Base64Encode.testBase64Encode            512  avgt   10    494.809 ± 0.413  ns/op
>> Base64Encode.testBase64Encode           1000  avgt   10    898.581 ± 0.944  ns/op
>> Base64Encode.testBase64Encode          20000  avgt   10  16464.411 ± 7.582  ns/op
>> 
>> # kunpeng 920, intrinsic
>> Base64Encode.testBase64Encode              1  avgt   10    17.494 ± 0.012  ns/op
>> Base64Encode.testBase64Encode              2  avgt   10    21.023 ± 0.169  ns/op
>> Base64Encode.testBase64Encode              3  avgt   10    25.772 ± 0.138  ns/op
>> Base64Encode.testBase64Encode              6  avgt   10    30.121 ± 0.347  ns/op
>> Base64Encode.testBase64Encode              7  avgt   10    31.591 ± 0.238  ns/op
>> Base64Encode.testBase64Encode              9  avgt   10    32.728 ± 0.395  ns/op
>> Base64Encode.testBase64Encode             10  avgt   10    35.110 ± 0.215  ns/op
>> Base64Encode.testBase64Encode             48  avgt   10    48.621 ± 0.314  ns/op
>> Base64Encode.testBase64Encode            512  avgt   10   113.391 ± 0.554  ns/op
>> Base64Encode.testBase64Encode           1000  avgt   10   180.749 ± 0.193  ns/op
>> Base64Encode.testBase64Encode          20000  avgt   10  3273.961 ± 5.706  ns/op
>> 
>> # kunpeng 920, default
>> Base64Encode.testBase64Encode              1  avgt   10     17.428 ± 0.037  ns/op
>> Base64Encode.testBase64Encode              2  avgt   10     20.926 ± 0.155  ns/op
>> Base64Encode.testBase64Encode              3  avgt   10     25.466 ± 0.140  ns/op
>> Base64Encode.testBase64Encode              6  avgt   10     32.526 ± 0.190  ns/op
>> Base64Encode.testBase64Encode              7  avgt   10     34.132 ± 0.387  ns/op
>> Base64Encode.testBase64Encode              9  avgt   10     36.685 ± 0.212  ns/op
>> Base64Encode.testBase64Encode             10  avgt   10     38.117 ± 0.246  ns/op
>> Base64Encode.testBase64Encode             48  avgt   10     62.447 ± 0.900  ns/op
>> Base64Encode.testBase64Encode            512  avgt   10    377.275 ± 0.162  ns/op
>> Base64Encode.testBase64Encode           1000  avgt   10    700.628 ± 0.509  ns/op
>> Base64Encode.testBase64Encode          20000  avgt   10  13626.764 ± 3.448  ns/op
>
> Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'master' into aarch64-base64-encoding
>  - reconstructed the macro as function generate_base64_encode_simdround
>  - change register name sp and unpack the macro
>  - Merge branch 'master' into aarch64-base64-encoding
>  - 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5483:

> 5481:     Register codec  = rscratch1;
> 5482:     Register length = rscratch2;
> 5483: 

Alias names for scratch registers have proved to be risky because assembler macros use scratch registers freely. A maintenance programmer might not to see this code uses rscratch1 and 2. Given that c_rarg6 and 7 are free, please use them.

-------------

PR: https://git.openjdk.java.net/jdk/pull/992


More information about the hotspot-dev mailing list