RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic
Andrew Haley
aph at openjdk.java.net
Mon Nov 2 13:30:00 UTC 2020
On Mon, 2 Nov 2020 03:05:48 GMT, Dong Bo <dongbo at openjdk.org> wrote:
> Base64.encodeBlock stub is implemented for x86_64.
> We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL.
> A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
>
> Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build.
> Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed.
>
> A JMH micro, Base64Encode.java, is added for performance test.
> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
> we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920.
>
> The Base64Encode.java JMH micro-benchmark results:
> Benchmark (maxNumBytes) Mode Cnt Score Error Units
> # kunpeng 916, intrinsic
> Base64Encode.testBase64Encode 1 avgt 10 31.564 ± 0.034 ns/op
> Base64Encode.testBase64Encode 2 avgt 10 33.921 ± 0.362 ns/op
> Base64Encode.testBase64Encode 3 avgt 10 38.015 ± 0.220 ns/op
> Base64Encode.testBase64Encode 6 avgt 10 41.115 ± 0.281 ns/op
> Base64Encode.testBase64Encode 7 avgt 10 42.161 ± 0.630 ns/op
> Base64Encode.testBase64Encode 9 avgt 10 44.797 ± 0.849 ns/op
> Base64Encode.testBase64Encode 10 avgt 10 46.013 ± 0.917 ns/op
> Base64Encode.testBase64Encode 48 avgt 10 67.984 ± 0.777 ns/op
> Base64Encode.testBase64Encode 512 avgt 10 174.494 ± 1.614 ns/op
> Base64Encode.testBase64Encode 1000 avgt 10 277.103 ± 0.306 ns/op
> Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ± 1.883 ns/op
>
> # kunpeng 916, default
> Base64Encode.testBase64Encode 1 avgt 10 31.710 ± 0.234 ns/op
> Base64Encode.testBase64Encode 2 avgt 10 33.978 ± 0.305 ns/op
> Base64Encode.testBase64Encode 3 avgt 10 40.059 ± 0.444 ns/op
> Base64Encode.testBase64Encode 6 avgt 10 47.958 ± 0.328 ns/op
> Base64Encode.testBase64Encode 7 avgt 10 49.017 ± 1.305 ns/op
> Base64Encode.testBase64Encode 9 avgt 10 53.150 ± 0.769 ns/op
> Base64Encode.testBase64Encode 10 avgt 10 55.418 ± 0.316 ns/op
> Base64Encode.testBase64Encode 48 avgt 10 93.517 ± 0.391 ns/op
> Base64Encode.testBase64Encode 512 avgt 10 494.809 ± 0.413 ns/op
> Base64Encode.testBase64Encode 1000 avgt 10 898.581 ± 0.944 ns/op
> Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ± 7.582 ns/op
>
> # kunpeng 920, intrinsic
> Base64Encode.testBase64Encode 1 avgt 10 17.494 ± 0.012 ns/op
> Base64Encode.testBase64Encode 2 avgt 10 21.023 ± 0.169 ns/op
> Base64Encode.testBase64Encode 3 avgt 10 25.772 ± 0.138 ns/op
> Base64Encode.testBase64Encode 6 avgt 10 30.121 ± 0.347 ns/op
> Base64Encode.testBase64Encode 7 avgt 10 31.591 ± 0.238 ns/op
> Base64Encode.testBase64Encode 9 avgt 10 32.728 ± 0.395 ns/op
> Base64Encode.testBase64Encode 10 avgt 10 35.110 ± 0.215 ns/op
> Base64Encode.testBase64Encode 48 avgt 10 48.621 ± 0.314 ns/op
> Base64Encode.testBase64Encode 512 avgt 10 113.391 ± 0.554 ns/op
> Base64Encode.testBase64Encode 1000 avgt 10 180.749 ± 0.193 ns/op
> Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ± 5.706 ns/op
>
> # kunpeng 920, default
> Base64Encode.testBase64Encode 1 avgt 10 17.428 ± 0.037 ns/op
> Base64Encode.testBase64Encode 2 avgt 10 20.926 ± 0.155 ns/op
> Base64Encode.testBase64Encode 3 avgt 10 25.466 ± 0.140 ns/op
> Base64Encode.testBase64Encode 6 avgt 10 32.526 ± 0.190 ns/op
> Base64Encode.testBase64Encode 7 avgt 10 34.132 ± 0.387 ns/op
> Base64Encode.testBase64Encode 9 avgt 10 36.685 ± 0.212 ns/op
> Base64Encode.testBase64Encode 10 avgt 10 38.117 ± 0.246 ns/op
> Base64Encode.testBase64Encode 48 avgt 10 62.447 ± 0.900 ns/op
> Base64Encode.testBase64Encode 512 avgt 10 377.275 ± 0.162 ns/op
> Base64Encode.testBase64Encode 1000 avgt 10 700.628 ± 0.509 ns/op
> Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ± 3.448 ns/op
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5442:
> 5440: Register src = c_rarg0; // source array
> 5441: Register sp = c_rarg1; // source start offset
> 5442: Register sl = c_rarg2; // source end offset
Please don't use "sp" as a register name.
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5453:
> 5451:
> 5452: #define BASE64_ENCODE_SIMD_ROUND(in0, in1, in2, out0, out1, out2, out3, SZ) \
> 5453: __ ld3(in0, in1, in2, __ T##SZ##B, __ post(src, 3 * SZ)); \
There's no need for this to be a macro -- as far as I can see.
-------------
PR: https://git.openjdk.java.net/jdk/pull/992
More information about the hotspot-dev
mailing list