RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4]

Dong Bo dongbo at openjdk.java.net
Tue Nov 3 07:01:12 UTC 2020


> Base64.encodeBlock stub is implemented for x86_64. 
> We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL.
> A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
> 
> Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build.
> Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed.
> 
> A JMH micro, Base64Encode.java, is added for performance test.
> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
> we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920.
> 
> The Base64Encode.java JMH micro-benchmark results:
> Benchmark                      (maxNumBytes)  Mode  Cnt      Score   Error  Units
> # kunpeng 916, intrinsic
> Base64Encode.testBase64Encode              1  avgt   10    31.564 ± 0.034  ns/op
> Base64Encode.testBase64Encode              2  avgt   10    33.921 ± 0.362  ns/op
> Base64Encode.testBase64Encode              3  avgt   10    38.015 ± 0.220  ns/op
> Base64Encode.testBase64Encode              6  avgt   10    41.115 ± 0.281  ns/op
> Base64Encode.testBase64Encode              7  avgt   10    42.161 ± 0.630  ns/op
> Base64Encode.testBase64Encode              9  avgt   10    44.797 ± 0.849  ns/op
> Base64Encode.testBase64Encode             10  avgt   10    46.013 ± 0.917  ns/op
> Base64Encode.testBase64Encode             48  avgt   10    67.984 ± 0.777  ns/op
> Base64Encode.testBase64Encode            512  avgt   10   174.494 ± 1.614  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10   277.103 ± 0.306  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  4261.018 ± 1.883  ns/op
> 
> # kunpeng 916, default
> Base64Encode.testBase64Encode              1  avgt   10     31.710 ± 0.234  ns/op
> Base64Encode.testBase64Encode              2  avgt   10     33.978 ± 0.305  ns/op
> Base64Encode.testBase64Encode              3  avgt   10     40.059 ± 0.444  ns/op
> Base64Encode.testBase64Encode              6  avgt   10     47.958 ± 0.328  ns/op
> Base64Encode.testBase64Encode              7  avgt   10     49.017 ± 1.305  ns/op
> Base64Encode.testBase64Encode              9  avgt   10     53.150 ± 0.769  ns/op
> Base64Encode.testBase64Encode             10  avgt   10     55.418 ± 0.316  ns/op
> Base64Encode.testBase64Encode             48  avgt   10     93.517 ± 0.391  ns/op
> Base64Encode.testBase64Encode            512  avgt   10    494.809 ± 0.413  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10    898.581 ± 0.944  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  16464.411 ± 7.582  ns/op
> 
> # kunpeng 920, intrinsic
> Base64Encode.testBase64Encode              1  avgt   10    17.494 ± 0.012  ns/op
> Base64Encode.testBase64Encode              2  avgt   10    21.023 ± 0.169  ns/op
> Base64Encode.testBase64Encode              3  avgt   10    25.772 ± 0.138  ns/op
> Base64Encode.testBase64Encode              6  avgt   10    30.121 ± 0.347  ns/op
> Base64Encode.testBase64Encode              7  avgt   10    31.591 ± 0.238  ns/op
> Base64Encode.testBase64Encode              9  avgt   10    32.728 ± 0.395  ns/op
> Base64Encode.testBase64Encode             10  avgt   10    35.110 ± 0.215  ns/op
> Base64Encode.testBase64Encode             48  avgt   10    48.621 ± 0.314  ns/op
> Base64Encode.testBase64Encode            512  avgt   10   113.391 ± 0.554  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10   180.749 ± 0.193  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  3273.961 ± 5.706  ns/op
> 
> # kunpeng 920, default
> Base64Encode.testBase64Encode              1  avgt   10     17.428 ± 0.037  ns/op
> Base64Encode.testBase64Encode              2  avgt   10     20.926 ± 0.155  ns/op
> Base64Encode.testBase64Encode              3  avgt   10     25.466 ± 0.140  ns/op
> Base64Encode.testBase64Encode              6  avgt   10     32.526 ± 0.190  ns/op
> Base64Encode.testBase64Encode              7  avgt   10     34.132 ± 0.387  ns/op
> Base64Encode.testBase64Encode              9  avgt   10     36.685 ± 0.212  ns/op
> Base64Encode.testBase64Encode             10  avgt   10     38.117 ± 0.246  ns/op
> Base64Encode.testBase64Encode             48  avgt   10     62.447 ± 0.900  ns/op
> Base64Encode.testBase64Encode            512  avgt   10    377.275 ± 0.162  ns/op
> Base64Encode.testBase64Encode           1000  avgt   10    700.628 ± 0.509  ns/op
> Base64Encode.testBase64Encode          20000  avgt   10  13626.764 ± 3.448  ns/op

Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:

 - Merge branch 'master' into aarch64-base64-encoding
 - reconstructed the macro as function generate_base64_encode_simdround
 - change register name sp and unpack the macro
 - Merge branch 'master' into aarch64-base64-encoding
 - 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/992/files
  - new: https://git.openjdk.java.net/jdk/pull/992/files/2999ac15..e5c50ffd

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=02-03

  Stats: 4625 lines in 215 files changed: 2529 ins; 894 del; 1202 mod
  Patch: https://git.openjdk.java.net/jdk/pull/992.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992

PR: https://git.openjdk.java.net/jdk/pull/992


More information about the hotspot-dev mailing list