RFR: 8256245: AArch64: Implement Base64 decoding intrinsic
Dong Bo
dongbo at openjdk.java.net
Sat Mar 27 09:05:45 UTC 2021
In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding.
Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic idea can be found at http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build.
Tests in `test/jdk/java/util/Base64/` and `compiler/intrinsics/base64/TestBase64.java` runned specially for the correctness of the implementation.
There can be illegal characters at the start of the input if the data is MIME encoded.
It would be no benefits to use SIMD for this case, so the stub use no-simd instructions for MIME encoded data now.
A JMH micro, Base64Decode.java, is added for performance test.
With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
we witness ~2.5x improvements with long inputs and no regression with short inputs for raw base64 decodeing, minor improvements (~10.95%) for MIME on Kunpeng916.
The Base64Decode.java JMH micro-benchmark results:
# Kunpeng916, intrinsic
Base64Decode.testBase64Decode 4 1 avgt 5 48.614 ± 0.609 ns/op
Base64Decode.testBase64Decode 4 3 avgt 5 58.199 ± 1.650 ns/op
Base64Decode.testBase64Decode 4 7 avgt 5 69.400 ± 0.931 ns/op
Base64Decode.testBase64Decode 4 32 avgt 5 96.818 ± 1.687 ns/op
Base64Decode.testBase64Decode 4 64 avgt 5 122.856 ± 9.217 ns/op
Base64Decode.testBase64Decode 4 80 avgt 5 130.935 ± 1.667 ns/op
Base64Decode.testBase64Decode 4 96 avgt 5 143.627 ± 1.751 ns/op
Base64Decode.testBase64Decode 4 112 avgt 5 152.311 ± 1.178 ns/op
Base64Decode.testBase64Decode 4 512 avgt 5 342.631 ± 0.584 ns/op
Base64Decode.testBase64Decode 4 1000 avgt 5 573.635 ± 1.050 ns/op
Base64Decode.testBase64Decode 4 20000 avgt 5 9534.136 ± 45.172 ns/op
Base64Decode.testBase64Decode 4 50000 avgt 5 22718.726 ± 192.070 ns/op
Base64Decode.testBase64MIMEDecode 4 1 avgt 10 63.558 ± 0.336 ns/op
Base64Decode.testBase64MIMEDecode 4 3 avgt 10 82.504 ± 0.848 ns/op
Base64Decode.testBase64MIMEDecode 4 7 avgt 10 120.591 ± 0.608 ns/op
Base64Decode.testBase64MIMEDecode 4 32 avgt 10 324.314 ± 6.236 ns/op
Base64Decode.testBase64MIMEDecode 4 64 avgt 10 532.678 ± 4.670 ns/op
Base64Decode.testBase64MIMEDecode 4 80 avgt 10 678.126 ± 4.324 ns/op
Base64Decode.testBase64MIMEDecode 4 96 avgt 10 771.603 ± 6.393 ns/op
Base64Decode.testBase64MIMEDecode 4 112 avgt 10 889.608 ± 0.759 ns/op
Base64Decode.testBase64MIMEDecode 4 512 avgt 10 3663.557 ± 3.422 ns/op
Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 7017.784 ± 9.128 ns/op
Base64Decode.testBase64MIMEDecode 4 20000 avgt 10 128670.660 ± 7951.521 ns/op
Base64Decode.testBase64MIMEDecode 4 50000 avgt 10 317113.667 ± 161.758 ns/op
# Kunpeng916, default
Base64Decode.testBase64Decode 4 1 avgt 5 48.455 ± 0.571 ns/op
Base64Decode.testBase64Decode 4 3 avgt 5 57.937 ± 0.505 ns/op
Base64Decode.testBase64Decode 4 7 avgt 5 73.823 ± 1.452 ns/op
Base64Decode.testBase64Decode 4 32 avgt 5 106.484 ± 1.243 ns/op
Base64Decode.testBase64Decode 4 64 avgt 5 141.004 ± 1.188 ns/op
Base64Decode.testBase64Decode 4 80 avgt 5 156.284 ± 0.572 ns/op
Base64Decode.testBase64Decode 4 96 avgt 5 174.137 ± 0.177 ns/op
Base64Decode.testBase64Decode 4 112 avgt 5 188.445 ± 0.572 ns/op
Base64Decode.testBase64Decode 4 512 avgt 5 610.847 ± 1.559 ns/op
Base64Decode.testBase64Decode 4 1000 avgt 5 1155.368 ± 0.813 ns/op
Base64Decode.testBase64Decode 4 20000 avgt 5 19751.477 ± 24.669 ns/op
Base64Decode.testBase64Decode 4 50000 avgt 5 50046.586 ± 523.155 ns/op
Base64Decode.testBase64MIMEDecode 4 1 avgt 10 64.130 ± 0.238 ns/op
Base64Decode.testBase64MIMEDecode 4 3 avgt 10 82.096 ± 0.205 ns/op
Base64Decode.testBase64MIMEDecode 4 7 avgt 10 118.849 ± 0.610 ns/op
Base64Decode.testBase64MIMEDecode 4 32 avgt 10 331.177 ± 4.732 ns/op
Base64Decode.testBase64MIMEDecode 4 64 avgt 10 549.117 ± 0.177 ns/op
Base64Decode.testBase64MIMEDecode 4 80 avgt 10 702.951 ± 4.572 ns/op
Base64Decode.testBase64MIMEDecode 4 96 avgt 10 799.566 ± 0.301 ns/op
Base64Decode.testBase64MIMEDecode 4 112 avgt 10 923.749 ± 0.389 ns/op
Base64Decode.testBase64MIMEDecode 4 512 avgt 10 4000.725 ± 2.519 ns/op
Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 7674.994 ± 9.281 ns/op
Base64Decode.testBase64MIMEDecode 4 20000 avgt 10 142059.001 ± 157.920 ns/op
Base64Decode.testBase64MIMEDecode 4 50000 avgt 10 355698.369 ± 216.542 ns/op
-------------
Commit messages:
- 8256245: AArch64: Implement Base64 decoding intrinsic
Changes: https://git.openjdk.java.net/jdk/pull/3228/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3228&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8256245
Stats: 410 lines in 3 files changed: 410 ins; 0 del; 0 mod
Patch: https://git.openjdk.java.net/jdk/pull/3228.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3228/head:pull/3228
PR: https://git.openjdk.java.net/jdk/pull/3228
More information about the core-libs-dev
mailing list