RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v2]

Dong Bo dongbo at openjdk.java.net
Tue Mar 30 03:22:12 UTC 2021


> In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding.
> Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic idea can be found at http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
> 
> Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build.
> Tests in `test/jdk/java/util/Base64/` and `compiler/intrinsics/base64/TestBase64.java` runned specially for the correctness of the implementation.
> 
> There can be illegal characters at the start of the input if the data is MIME encoded.
> It would be no benefits to use SIMD for this case, so the stub use no-simd instructions for MIME encoded data now.
> 
> A JMH micro, Base64Decode.java, is added for performance test.
> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
> we witness ~2.5x improvements with long inputs and no regression with short inputs for raw base64 decodeing, minor improvements (~10.95%) for MIME on Kunpeng916.
> 
> The Base64Decode.java JMH micro-benchmark results:
> 
> Benchmark                          (lineSize)  (maxNumBytes)  Mode  Cnt       Score       Error  Units
> 
> # Kunpeng916, intrinsic
> Base64Decode.testBase64Decode               4              1  avgt    5      48.614 ±     0.609  ns/op
> Base64Decode.testBase64Decode               4              3  avgt    5      58.199 ±     1.650  ns/op
> Base64Decode.testBase64Decode               4              7  avgt    5      69.400 ±     0.931  ns/op
> Base64Decode.testBase64Decode               4             32  avgt    5      96.818 ±     1.687  ns/op
> Base64Decode.testBase64Decode               4             64  avgt    5     122.856 ±     9.217  ns/op
> Base64Decode.testBase64Decode               4             80  avgt    5     130.935 ±     1.667  ns/op
> Base64Decode.testBase64Decode               4             96  avgt    5     143.627 ±     1.751  ns/op
> Base64Decode.testBase64Decode               4            112  avgt    5     152.311 ±     1.178  ns/op
> Base64Decode.testBase64Decode               4            512  avgt    5     342.631 ±     0.584  ns/op
> Base64Decode.testBase64Decode               4           1000  avgt    5     573.635 ±     1.050  ns/op
> Base64Decode.testBase64Decode               4          20000  avgt    5    9534.136 ±    45.172  ns/op
> Base64Decode.testBase64Decode               4          50000  avgt    5   22718.726 ±   192.070  ns/op
> Base64Decode.testBase64MIMEDecode           4              1  avgt   10      63.558 ±    0.336  ns/op
> Base64Decode.testBase64MIMEDecode           4              3  avgt   10      82.504 ±    0.848  ns/op
> Base64Decode.testBase64MIMEDecode           4              7  avgt   10     120.591 ±    0.608  ns/op
> Base64Decode.testBase64MIMEDecode           4             32  avgt   10     324.314 ±    6.236  ns/op
> Base64Decode.testBase64MIMEDecode           4             64  avgt   10     532.678 ±    4.670  ns/op
> Base64Decode.testBase64MIMEDecode           4             80  avgt   10     678.126 ±    4.324  ns/op
> Base64Decode.testBase64MIMEDecode           4             96  avgt   10     771.603 ±    6.393  ns/op
> Base64Decode.testBase64MIMEDecode           4            112  avgt   10     889.608 ±   0.759  ns/op
> Base64Decode.testBase64MIMEDecode           4            512  avgt   10    3663.557 ±    3.422  ns/op
> Base64Decode.testBase64MIMEDecode           4           1000  avgt   10    7017.784 ±    9.128  ns/op
> Base64Decode.testBase64MIMEDecode           4          20000  avgt   10  128670.660 ± 7951.521  ns/op
> Base64Decode.testBase64MIMEDecode           4          50000  avgt   10  317113.667 ±  161.758  ns/op
> 
> # Kunpeng916, default
> Base64Decode.testBase64Decode               4              1  avgt    5      48.455 ±   0.571  ns/op
> Base64Decode.testBase64Decode               4              3  avgt    5      57.937 ±   0.505  ns/op
> Base64Decode.testBase64Decode               4              7  avgt    5      73.823 ±   1.452  ns/op
> Base64Decode.testBase64Decode               4             32  avgt    5     106.484 ±   1.243  ns/op
> Base64Decode.testBase64Decode               4             64  avgt    5     141.004 ±   1.188  ns/op
> Base64Decode.testBase64Decode               4             80  avgt    5     156.284 ±   0.572  ns/op
> Base64Decode.testBase64Decode               4             96  avgt    5     174.137 ±   0.177  ns/op
> Base64Decode.testBase64Decode               4            112  avgt    5     188.445 ±   0.572  ns/op
> Base64Decode.testBase64Decode               4            512  avgt    5     610.847 ±   1.559  ns/op
> Base64Decode.testBase64Decode               4           1000  avgt    5    1155.368 ±   0.813  ns/op
> Base64Decode.testBase64Decode               4          20000  avgt    5   19751.477 ±  24.669  ns/op
> Base64Decode.testBase64Decode               4          50000  avgt    5   50046.586 ± 523.155  ns/op
> Base64Decode.testBase64MIMEDecode           4              1  avgt   10      64.130 ±   0.238  ns/op
> Base64Decode.testBase64MIMEDecode           4              3  avgt   10      82.096 ±   0.205  ns/op
> Base64Decode.testBase64MIMEDecode           4              7  avgt   10     118.849 ±   0.610  ns/op
> Base64Decode.testBase64MIMEDecode           4             32  avgt   10     331.177 ±   4.732  ns/op
> Base64Decode.testBase64MIMEDecode           4             64  avgt   10     549.117 ±   0.177  ns/op
> Base64Decode.testBase64MIMEDecode           4             80  avgt   10     702.951 ±   4.572  ns/op
> Base64Decode.testBase64MIMEDecode           4             96  avgt   10     799.566 ±   0.301  ns/op
> Base64Decode.testBase64MIMEDecode           4            112  avgt   10     923.749 ±   0.389  ns/op
> Base64Decode.testBase64MIMEDecode           4            512  avgt   10    4000.725 ±   2.519  ns/op
> Base64Decode.testBase64MIMEDecode           4           1000  avgt   10    7674.994 ±   9.281  ns/op
> Base64Decode.testBase64MIMEDecode           4          20000  avgt   10  142059.001 ± 157.920  ns/op
> Base64Decode.testBase64MIMEDecode           4          50000  avgt   10  355698.369 ± 216.542  ns/op

Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:

 - trivial fixes
 - Handling error in SIMD case with loops, combining two non-SIMD cases into one code blob, addressing other comments
 - Merge branch 'master' into aarch64.base64.decode
 - 8256245: AArch64: Implement Base64 decoding intrinsic

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3228/files
  - new: https://git.openjdk.java.net/jdk/pull/3228/files/8a898aec..e658ebf4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3228&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3228&range=00-01

  Stats: 9524 lines in 363 files changed: 7727 ins; 450 del; 1347 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3228.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3228/head:pull/3228

PR: https://git.openjdk.java.net/jdk/pull/3228


More information about the core-libs-dev mailing list