RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v5]

Andrew Haley aph at openjdk.java.net
Tue Nov 10 08:41:00 UTC 2020


On Tue, 3 Nov 2020 11:57:16 GMT, Dong Bo <dongbo at openjdk.org> wrote:

>> Base64.encodeBlock stub is implemented for x86_64. 
>> We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL.
>> A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
>> 
>> Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build.
>> Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed.
>> 
>> A JMH micro, Base64Encode.java, is added for performance test.
>> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
>> we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920.
>> 
>> The Base64Encode.java JMH micro-benchmark results:
>> Benchmark                      (maxNumBytes)  Mode  Cnt      Score   Error  Units
>> # kunpeng 916, intrinsic
>> Base64Encode.testBase64Encode              1  avgt   10    31.564 ± 0.034  ns/op
>> Base64Encode.testBase64Encode              2  avgt   10    33.921 ± 0.362  ns/op
>> Base64Encode.testBase64Encode              3  avgt   10    38.015 ± 0.220  ns/op
>> Base64Encode.testBase64Encode              6  avgt   10    41.115 ± 0.281  ns/op
>> Base64Encode.testBase64Encode              7  avgt   10    42.161 ± 0.630  ns/op
>> Base64Encode.testBase64Encode              9  avgt   10    44.797 ± 0.849  ns/op
>> Base64Encode.testBase64Encode             10  avgt   10    46.013 ± 0.917  ns/op
>> Base64Encode.testBase64Encode             48  avgt   10    67.984 ± 0.777  ns/op
>> Base64Encode.testBase64Encode            512  avgt   10   174.494 ± 1.614  ns/op
>> Base64Encode.testBase64Encode           1000  avgt   10   277.103 ± 0.306  ns/op
>> Base64Encode.testBase64Encode          20000  avgt   10  4261.018 ± 1.883  ns/op
>> 
>> # kunpeng 916, default
>> Base64Encode.testBase64Encode              1  avgt   10     31.710 ± 0.234  ns/op
>> Base64Encode.testBase64Encode              2  avgt   10     33.978 ± 0.305  ns/op
>> Base64Encode.testBase64Encode              3  avgt   10     40.059 ± 0.444  ns/op
>> Base64Encode.testBase64Encode              6  avgt   10     47.958 ± 0.328  ns/op
>> Base64Encode.testBase64Encode              7  avgt   10     49.017 ± 1.305  ns/op
>> Base64Encode.testBase64Encode              9  avgt   10     53.150 ± 0.769  ns/op
>> Base64Encode.testBase64Encode             10  avgt   10     55.418 ± 0.316  ns/op
>> Base64Encode.testBase64Encode             48  avgt   10     93.517 ± 0.391  ns/op
>> Base64Encode.testBase64Encode            512  avgt   10    494.809 ± 0.413  ns/op
>> Base64Encode.testBase64Encode           1000  avgt   10    898.581 ± 0.944  ns/op
>> Base64Encode.testBase64Encode          20000  avgt   10  16464.411 ± 7.582  ns/op
>> 
>> # kunpeng 920, intrinsic
>> Base64Encode.testBase64Encode              1  avgt   10    17.494 ± 0.012  ns/op
>> Base64Encode.testBase64Encode              2  avgt   10    21.023 ± 0.169  ns/op
>> Base64Encode.testBase64Encode              3  avgt   10    25.772 ± 0.138  ns/op
>> Base64Encode.testBase64Encode              6  avgt   10    30.121 ± 0.347  ns/op
>> Base64Encode.testBase64Encode              7  avgt   10    31.591 ± 0.238  ns/op
>> Base64Encode.testBase64Encode              9  avgt   10    32.728 ± 0.395  ns/op
>> Base64Encode.testBase64Encode             10  avgt   10    35.110 ± 0.215  ns/op
>> Base64Encode.testBase64Encode             48  avgt   10    48.621 ± 0.314  ns/op
>> Base64Encode.testBase64Encode            512  avgt   10   113.391 ± 0.554  ns/op
>> Base64Encode.testBase64Encode           1000  avgt   10   180.749 ± 0.193  ns/op
>> Base64Encode.testBase64Encode          20000  avgt   10  3273.961 ± 5.706  ns/op
>> 
>> # kunpeng 920, default
>> Base64Encode.testBase64Encode              1  avgt   10     17.428 ± 0.037  ns/op
>> Base64Encode.testBase64Encode              2  avgt   10     20.926 ± 0.155  ns/op
>> Base64Encode.testBase64Encode              3  avgt   10     25.466 ± 0.140  ns/op
>> Base64Encode.testBase64Encode              6  avgt   10     32.526 ± 0.190  ns/op
>> Base64Encode.testBase64Encode              7  avgt   10     34.132 ± 0.387  ns/op
>> Base64Encode.testBase64Encode              9  avgt   10     36.685 ± 0.212  ns/op
>> Base64Encode.testBase64Encode             10  avgt   10     38.117 ± 0.246  ns/op
>> Base64Encode.testBase64Encode             48  avgt   10     62.447 ± 0.900  ns/op
>> Base64Encode.testBase64Encode            512  avgt   10    377.275 ± 0.162  ns/op
>> Base64Encode.testBase64Encode           1000  avgt   10    700.628 ± 0.509  ns/op
>> Base64Encode.testBase64Encode          20000  avgt   10  13626.764 ± 3.448  ns/op
>
> Dong Bo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use r6/r7 instead of scratch registers

Thanks. I'm sorry that this took so long.

-------------

Marked as reviewed by aph (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/992


More information about the hotspot-dev mailing list