RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4]
Dong Bo
dongbo at openjdk.java.net
Tue Nov 3 07:01:12 UTC 2020
> Base64.encodeBlock stub is implemented for x86_64.
> We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL.
> A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
>
> Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build.
> Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed.
>
> A JMH micro, Base64Encode.java, is added for performance test.
> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
> we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920.
>
> The Base64Encode.java JMH micro-benchmark results:
> Benchmark (maxNumBytes) Mode Cnt Score Error Units
> # kunpeng 916, intrinsic
> Base64Encode.testBase64Encode 1 avgt 10 31.564 ± 0.034 ns/op
> Base64Encode.testBase64Encode 2 avgt 10 33.921 ± 0.362 ns/op
> Base64Encode.testBase64Encode 3 avgt 10 38.015 ± 0.220 ns/op
> Base64Encode.testBase64Encode 6 avgt 10 41.115 ± 0.281 ns/op
> Base64Encode.testBase64Encode 7 avgt 10 42.161 ± 0.630 ns/op
> Base64Encode.testBase64Encode 9 avgt 10 44.797 ± 0.849 ns/op
> Base64Encode.testBase64Encode 10 avgt 10 46.013 ± 0.917 ns/op
> Base64Encode.testBase64Encode 48 avgt 10 67.984 ± 0.777 ns/op
> Base64Encode.testBase64Encode 512 avgt 10 174.494 ± 1.614 ns/op
> Base64Encode.testBase64Encode 1000 avgt 10 277.103 ± 0.306 ns/op
> Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ± 1.883 ns/op
>
> # kunpeng 916, default
> Base64Encode.testBase64Encode 1 avgt 10 31.710 ± 0.234 ns/op
> Base64Encode.testBase64Encode 2 avgt 10 33.978 ± 0.305 ns/op
> Base64Encode.testBase64Encode 3 avgt 10 40.059 ± 0.444 ns/op
> Base64Encode.testBase64Encode 6 avgt 10 47.958 ± 0.328 ns/op
> Base64Encode.testBase64Encode 7 avgt 10 49.017 ± 1.305 ns/op
> Base64Encode.testBase64Encode 9 avgt 10 53.150 ± 0.769 ns/op
> Base64Encode.testBase64Encode 10 avgt 10 55.418 ± 0.316 ns/op
> Base64Encode.testBase64Encode 48 avgt 10 93.517 ± 0.391 ns/op
> Base64Encode.testBase64Encode 512 avgt 10 494.809 ± 0.413 ns/op
> Base64Encode.testBase64Encode 1000 avgt 10 898.581 ± 0.944 ns/op
> Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ± 7.582 ns/op
>
> # kunpeng 920, intrinsic
> Base64Encode.testBase64Encode 1 avgt 10 17.494 ± 0.012 ns/op
> Base64Encode.testBase64Encode 2 avgt 10 21.023 ± 0.169 ns/op
> Base64Encode.testBase64Encode 3 avgt 10 25.772 ± 0.138 ns/op
> Base64Encode.testBase64Encode 6 avgt 10 30.121 ± 0.347 ns/op
> Base64Encode.testBase64Encode 7 avgt 10 31.591 ± 0.238 ns/op
> Base64Encode.testBase64Encode 9 avgt 10 32.728 ± 0.395 ns/op
> Base64Encode.testBase64Encode 10 avgt 10 35.110 ± 0.215 ns/op
> Base64Encode.testBase64Encode 48 avgt 10 48.621 ± 0.314 ns/op
> Base64Encode.testBase64Encode 512 avgt 10 113.391 ± 0.554 ns/op
> Base64Encode.testBase64Encode 1000 avgt 10 180.749 ± 0.193 ns/op
> Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ± 5.706 ns/op
>
> # kunpeng 920, default
> Base64Encode.testBase64Encode 1 avgt 10 17.428 ± 0.037 ns/op
> Base64Encode.testBase64Encode 2 avgt 10 20.926 ± 0.155 ns/op
> Base64Encode.testBase64Encode 3 avgt 10 25.466 ± 0.140 ns/op
> Base64Encode.testBase64Encode 6 avgt 10 32.526 ± 0.190 ns/op
> Base64Encode.testBase64Encode 7 avgt 10 34.132 ± 0.387 ns/op
> Base64Encode.testBase64Encode 9 avgt 10 36.685 ± 0.212 ns/op
> Base64Encode.testBase64Encode 10 avgt 10 38.117 ± 0.246 ns/op
> Base64Encode.testBase64Encode 48 avgt 10 62.447 ± 0.900 ns/op
> Base64Encode.testBase64Encode 512 avgt 10 377.275 ± 0.162 ns/op
> Base64Encode.testBase64Encode 1000 avgt 10 700.628 ± 0.509 ns/op
> Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ± 3.448 ns/op
Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
- Merge branch 'master' into aarch64-base64-encoding
- reconstructed the macro as function generate_base64_encode_simdround
- change register name sp and unpack the macro
- Merge branch 'master' into aarch64-base64-encoding
- 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/992/files
- new: https://git.openjdk.java.net/jdk/pull/992/files/2999ac15..e5c50ffd
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=03
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=02-03
Stats: 4625 lines in 215 files changed: 2529 ins; 894 del; 1202 mod
Patch: https://git.openjdk.java.net/jdk/pull/992.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992
PR: https://git.openjdk.java.net/jdk/pull/992
More information about the hotspot-dev
mailing list