RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v7]

Dong Bo dongbo at openjdk.java.net
Tue Nov 10 03:20:09 UTC 2020


> Base64.encodeBlock stub is implemented for x86_64. 
> We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL.
> A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
> 
> Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build.
> Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed.
> 
> A JMH micro, Base64Encode.java, is added for performance test.
> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
> we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920.
> 
> The Base64Encode.java JMH micro-benchmark results:
> Benchmark                      (maxNumBytes)  Mode  Cnt      Score   Error  Units
> # kunpeng 916, intrinsic
> Base64Encode.testBase64Encode              1  avgt   10    31.564 ± 0.034  ns/op
> Base64Encode.testBase64Encode              2  avgt   10    33.921 ± 0.362  ns/op
> Base64Encode.testBase64Encode              3  avgt   10    38.015 ± 0.220  ns/op
> Base64Encode.testBase64Encode              6  avgt   10    41.115 ± 0.281  ns/op
> Base64Encode.testBase64Encode              7  avgt   10    42.161 ± 0.630  ns/op
> Base64Encode.testBase64Encode              9  avgt   10    44.797 ± 0.849  ns/op
> Base64Encode.testBase64Encode             10  avgt   10    46.013 ± 0.917  ns/op
> Base64Encode.testBase64Encode             48  avgt   10    67.984 ± 0.777  ns/op
> Base64Encode.testBase64Encode            512  avgt   10   174.494 ± 1.614  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10   277.103 ± 0.306  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  4261.018 ± 1.883  ns/op
> 
> # kunpeng 916, default
> Base64Encode.testBase64Encode              1  avgt   10     31.710 ± 0.234  ns/op
> Base64Encode.testBase64Encode              2  avgt   10     33.978 ± 0.305  ns/op
> Base64Encode.testBase64Encode              3  avgt   10     40.059 ± 0.444  ns/op
> Base64Encode.testBase64Encode              6  avgt   10     47.958 ± 0.328  ns/op
> Base64Encode.testBase64Encode              7  avgt   10     49.017 ± 1.305  ns/op
> Base64Encode.testBase64Encode              9  avgt   10     53.150 ± 0.769  ns/op
> Base64Encode.testBase64Encode             10  avgt   10     55.418 ± 0.316  ns/op
> Base64Encode.testBase64Encode             48  avgt   10     93.517 ± 0.391  ns/op
> Base64Encode.testBase64Encode            512  avgt   10    494.809 ± 0.413  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10    898.581 ± 0.944  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  16464.411 ± 7.582  ns/op
> 
> # kunpeng 920, intrinsic
> Base64Encode.testBase64Encode              1  avgt   10    17.494 ± 0.012  ns/op
> Base64Encode.testBase64Encode              2  avgt   10    21.023 ± 0.169  ns/op
> Base64Encode.testBase64Encode              3  avgt   10    25.772 ± 0.138  ns/op
> Base64Encode.testBase64Encode              6  avgt   10    30.121 ± 0.347  ns/op
> Base64Encode.testBase64Encode              7  avgt   10    31.591 ± 0.238  ns/op
> Base64Encode.testBase64Encode              9  avgt   10    32.728 ± 0.395  ns/op
> Base64Encode.testBase64Encode             10  avgt   10    35.110 ± 0.215  ns/op
> Base64Encode.testBase64Encode             48  avgt   10    48.621 ± 0.314  ns/op
> Base64Encode.testBase64Encode            512  avgt   10   113.391 ± 0.554  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10   180.749 ± 0.193  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  3273.961 ± 5.706  ns/op
> 
> # kunpeng 920, default
> Base64Encode.testBase64Encode              1  avgt   10     17.428 ± 0.037  ns/op
> Base64Encode.testBase64Encode              2  avgt   10     20.926 ± 0.155  ns/op
> Base64Encode.testBase64Encode              3  avgt   10     25.466 ± 0.140  ns/op
> Base64Encode.testBase64Encode              6  avgt   10     32.526 ± 0.190  ns/op
> Base64Encode.testBase64Encode              7  avgt   10     34.132 ± 0.387  ns/op
> Base64Encode.testBase64Encode              9  avgt   10     36.685 ± 0.212  ns/op
> Base64Encode.testBase64Encode             10  avgt   10     38.117 ± 0.246  ns/op
> Base64Encode.testBase64Encode             48  avgt   10     62.447 ± 0.900  ns/op
> Base64Encode.testBase64Encode            512  avgt   10    377.275 ± 0.162  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10    700.628 ± 0.509  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  13626.764 ± 3.448  ns/op

Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:

  fix register naming style

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/992/files
  - new: https://git.openjdk.java.net/jdk/pull/992/files/e3380c84..5f4bc36c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=05-06

  Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/992.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992

PR: https://git.openjdk.java.net/jdk/pull/992


More information about the hotspot-dev mailing list