Integrated: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic

Dong Bo dongbo at openjdk.java.net
Wed Nov 11 01:55:58 UTC 2020


On Mon, 2 Nov 2020 03:05:48 GMT, Dong Bo <dongbo at openjdk.org> wrote:

> Base64.encodeBlock stub is implemented for x86_64. 
> We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL.
> A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
> 
> Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build.
> Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed.
> 
> A JMH micro, Base64Encode.java, is added for performance test.
> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
> we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920.
> 
> The Base64Encode.java JMH micro-benchmark results:
> Benchmark                      (maxNumBytes)  Mode  Cnt      Score   Error  Units
> # kunpeng 916, intrinsic
> Base64Encode.testBase64Encode              1  avgt   10    31.564 ± 0.034  ns/op
> Base64Encode.testBase64Encode              2  avgt   10    33.921 ± 0.362  ns/op
> Base64Encode.testBase64Encode              3  avgt   10    38.015 ± 0.220  ns/op
> Base64Encode.testBase64Encode              6  avgt   10    41.115 ± 0.281  ns/op
> Base64Encode.testBase64Encode              7  avgt   10    42.161 ± 0.630  ns/op
> Base64Encode.testBase64Encode              9  avgt   10    44.797 ± 0.849  ns/op
> Base64Encode.testBase64Encode             10  avgt   10    46.013 ± 0.917  ns/op
> Base64Encode.testBase64Encode             48  avgt   10    67.984 ± 0.777  ns/op
> Base64Encode.testBase64Encode            512  avgt   10   174.494 ± 1.614  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10   277.103 ± 0.306  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  4261.018 ± 1.883  ns/op
> 
> # kunpeng 916, default
> Base64Encode.testBase64Encode              1  avgt   10     31.710 ± 0.234  ns/op
> Base64Encode.testBase64Encode              2  avgt   10     33.978 ± 0.305  ns/op
> Base64Encode.testBase64Encode              3  avgt   10     40.059 ± 0.444  ns/op
> Base64Encode.testBase64Encode              6  avgt   10     47.958 ± 0.328  ns/op
> Base64Encode.testBase64Encode              7  avgt   10     49.017 ± 1.305  ns/op
> Base64Encode.testBase64Encode              9  avgt   10     53.150 ± 0.769  ns/op
> Base64Encode.testBase64Encode             10  avgt   10     55.418 ± 0.316  ns/op
> Base64Encode.testBase64Encode             48  avgt   10     93.517 ± 0.391  ns/op
> Base64Encode.testBase64Encode            512  avgt   10    494.809 ± 0.413  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10    898.581 ± 0.944  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  16464.411 ± 7.582  ns/op
> 
> # kunpeng 920, intrinsic
> Base64Encode.testBase64Encode              1  avgt   10    17.494 ± 0.012  ns/op
> Base64Encode.testBase64Encode              2  avgt   10    21.023 ± 0.169  ns/op
> Base64Encode.testBase64Encode              3  avgt   10    25.772 ± 0.138  ns/op
> Base64Encode.testBase64Encode              6  avgt   10    30.121 ± 0.347  ns/op
> Base64Encode.testBase64Encode              7  avgt   10    31.591 ± 0.238  ns/op
> Base64Encode.testBase64Encode              9  avgt   10    32.728 ± 0.395  ns/op
> Base64Encode.testBase64Encode             10  avgt   10    35.110 ± 0.215  ns/op
> Base64Encode.testBase64Encode             48  avgt   10    48.621 ± 0.314  ns/op
> Base64Encode.testBase64Encode            512  avgt   10   113.391 ± 0.554  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10   180.749 ± 0.193  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  3273.961 ± 5.706  ns/op
> 
> # kunpeng 920, default
> Base64Encode.testBase64Encode              1  avgt   10     17.428 ± 0.037  ns/op
> Base64Encode.testBase64Encode              2  avgt   10     20.926 ± 0.155  ns/op
> Base64Encode.testBase64Encode              3  avgt   10     25.466 ± 0.140  ns/op
> Base64Encode.testBase64Encode              6  avgt   10     32.526 ± 0.190  ns/op
> Base64Encode.testBase64Encode              7  avgt   10     34.132 ± 0.387  ns/op
> Base64Encode.testBase64Encode              9  avgt   10     36.685 ± 0.212  ns/op
> Base64Encode.testBase64Encode             10  avgt   10     38.117 ± 0.246  ns/op
> Base64Encode.testBase64Encode             48  avgt   10     62.447 ± 0.900  ns/op
> Base64Encode.testBase64Encode            512  avgt   10    377.275 ± 0.162  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10    700.628 ± 0.509  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  13626.764 ± 3.448  ns/op

This pull request has now been integrated.

Changeset: 8638cd9a
Author:    Dong Bo <dongbo at openjdk.org>
Committer: Fei Yang <fyang at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/8638cd9a
Stats:     226 lines in 3 files changed: 226 ins; 0 del; 0 mod

8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic

Reviewed-by: aph

-------------

PR: https://git.openjdk.java.net/jdk/pull/992


More information about the hotspot-dev mailing list