RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v2]
Dong Bo
dongbo at openjdk.java.net
Tue Mar 30 03:22:12 UTC 2021
> In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding.
> Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic idea can be found at http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
>
> Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build.
> Tests in `test/jdk/java/util/Base64/` and `compiler/intrinsics/base64/TestBase64.java` runned specially for the correctness of the implementation.
>
> There can be illegal characters at the start of the input if the data is MIME encoded.
> It would be no benefits to use SIMD for this case, so the stub use no-simd instructions for MIME encoded data now.
>
> A JMH micro, Base64Decode.java, is added for performance test.
> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
> we witness ~2.5x improvements with long inputs and no regression with short inputs for raw base64 decodeing, minor improvements (~10.95%) for MIME on Kunpeng916.
>
> The Base64Decode.java JMH micro-benchmark results:
>
> Benchmark (lineSize) (maxNumBytes) Mode Cnt Score Error Units
>
> # Kunpeng916, intrinsic
> Base64Decode.testBase64Decode 4 1 avgt 5 48.614 ± 0.609 ns/op
> Base64Decode.testBase64Decode 4 3 avgt 5 58.199 ± 1.650 ns/op
> Base64Decode.testBase64Decode 4 7 avgt 5 69.400 ± 0.931 ns/op
> Base64Decode.testBase64Decode 4 32 avgt 5 96.818 ± 1.687 ns/op
> Base64Decode.testBase64Decode 4 64 avgt 5 122.856 ± 9.217 ns/op
> Base64Decode.testBase64Decode 4 80 avgt 5 130.935 ± 1.667 ns/op
> Base64Decode.testBase64Decode 4 96 avgt 5 143.627 ± 1.751 ns/op
> Base64Decode.testBase64Decode 4 112 avgt 5 152.311 ± 1.178 ns/op
> Base64Decode.testBase64Decode 4 512 avgt 5 342.631 ± 0.584 ns/op
> Base64Decode.testBase64Decode 4 1000 avgt 5 573.635 ± 1.050 ns/op
> Base64Decode.testBase64Decode 4 20000 avgt 5 9534.136 ± 45.172 ns/op
> Base64Decode.testBase64Decode 4 50000 avgt 5 22718.726 ± 192.070 ns/op
> Base64Decode.testBase64MIMEDecode 4 1 avgt 10 63.558 ± 0.336 ns/op
> Base64Decode.testBase64MIMEDecode 4 3 avgt 10 82.504 ± 0.848 ns/op
> Base64Decode.testBase64MIMEDecode 4 7 avgt 10 120.591 ± 0.608 ns/op
> Base64Decode.testBase64MIMEDecode 4 32 avgt 10 324.314 ± 6.236 ns/op
> Base64Decode.testBase64MIMEDecode 4 64 avgt 10 532.678 ± 4.670 ns/op
> Base64Decode.testBase64MIMEDecode 4 80 avgt 10 678.126 ± 4.324 ns/op
> Base64Decode.testBase64MIMEDecode 4 96 avgt 10 771.603 ± 6.393 ns/op
> Base64Decode.testBase64MIMEDecode 4 112 avgt 10 889.608 ± 0.759 ns/op
> Base64Decode.testBase64MIMEDecode 4 512 avgt 10 3663.557 ± 3.422 ns/op
> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 7017.784 ± 9.128 ns/op
> Base64Decode.testBase64MIMEDecode 4 20000 avgt 10 128670.660 ± 7951.521 ns/op
> Base64Decode.testBase64MIMEDecode 4 50000 avgt 10 317113.667 ± 161.758 ns/op
>
> # Kunpeng916, default
> Base64Decode.testBase64Decode 4 1 avgt 5 48.455 ± 0.571 ns/op
> Base64Decode.testBase64Decode 4 3 avgt 5 57.937 ± 0.505 ns/op
> Base64Decode.testBase64Decode 4 7 avgt 5 73.823 ± 1.452 ns/op
> Base64Decode.testBase64Decode 4 32 avgt 5 106.484 ± 1.243 ns/op
> Base64Decode.testBase64Decode 4 64 avgt 5 141.004 ± 1.188 ns/op
> Base64Decode.testBase64Decode 4 80 avgt 5 156.284 ± 0.572 ns/op
> Base64Decode.testBase64Decode 4 96 avgt 5 174.137 ± 0.177 ns/op
> Base64Decode.testBase64Decode 4 112 avgt 5 188.445 ± 0.572 ns/op
> Base64Decode.testBase64Decode 4 512 avgt 5 610.847 ± 1.559 ns/op
> Base64Decode.testBase64Decode 4 1000 avgt 5 1155.368 ± 0.813 ns/op
> Base64Decode.testBase64Decode 4 20000 avgt 5 19751.477 ± 24.669 ns/op
> Base64Decode.testBase64Decode 4 50000 avgt 5 50046.586 ± 523.155 ns/op
> Base64Decode.testBase64MIMEDecode 4 1 avgt 10 64.130 ± 0.238 ns/op
> Base64Decode.testBase64MIMEDecode 4 3 avgt 10 82.096 ± 0.205 ns/op
> Base64Decode.testBase64MIMEDecode 4 7 avgt 10 118.849 ± 0.610 ns/op
> Base64Decode.testBase64MIMEDecode 4 32 avgt 10 331.177 ± 4.732 ns/op
> Base64Decode.testBase64MIMEDecode 4 64 avgt 10 549.117 ± 0.177 ns/op
> Base64Decode.testBase64MIMEDecode 4 80 avgt 10 702.951 ± 4.572 ns/op
> Base64Decode.testBase64MIMEDecode 4 96 avgt 10 799.566 ± 0.301 ns/op
> Base64Decode.testBase64MIMEDecode 4 112 avgt 10 923.749 ± 0.389 ns/op
> Base64Decode.testBase64MIMEDecode 4 512 avgt 10 4000.725 ± 2.519 ns/op
> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 7674.994 ± 9.281 ns/op
> Base64Decode.testBase64MIMEDecode 4 20000 avgt 10 142059.001 ± 157.920 ns/op
> Base64Decode.testBase64MIMEDecode 4 50000 avgt 10 355698.369 ± 216.542 ns/op
Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
- trivial fixes
- Handling error in SIMD case with loops, combining two non-SIMD cases into one code blob, addressing other comments
- Merge branch 'master' into aarch64.base64.decode
- 8256245: AArch64: Implement Base64 decoding intrinsic
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/3228/files
- new: https://git.openjdk.java.net/jdk/pull/3228/files/8a898aec..e658ebf4
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3228&range=01
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3228&range=00-01
Stats: 9524 lines in 363 files changed: 7727 ins; 450 del; 1347 mod
Patch: https://git.openjdk.java.net/jdk/pull/3228.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3228/head:pull/3228
PR: https://git.openjdk.java.net/jdk/pull/3228
More information about the core-libs-dev
mailing list