RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

Nick Gasson ngasson at openjdk.java.net
Mon Mar 29 03:15:28 UTC 2021


On Sat, 27 Mar 2021 08:58:03 GMT, Dong Bo <dongbo at openjdk.org> wrote:

> In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding.
> Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic idea can be found at http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
> 
> Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build.
> Tests in `test/jdk/java/util/Base64/` and `compiler/intrinsics/base64/TestBase64.java` runned specially for the correctness of the implementation.
> 
> There can be illegal characters at the start of the input if the data is MIME encoded.
> It would be no benefits to use SIMD for this case, so the stub use no-simd instructions for MIME encoded data now.
> 
> A JMH micro, Base64Decode.java, is added for performance test.
> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro),
> we witness ~2.5x improvements with long inputs and no regression with short inputs for raw base64 decodeing, minor improvements (~10.95%) for MIME on Kunpeng916.
> 
> The Base64Decode.java JMH micro-benchmark results:
> 
> Benchmark                          (lineSize)  (maxNumBytes)  Mode  Cnt       Score       Error  Units
> 
> # Kunpeng916, intrinsic
> Base64Decode.testBase64Decode               4              1  avgt    5      48.614 ±     0.609  ns/op
> Base64Decode.testBase64Decode               4              3  avgt    5      58.199 ±     1.650  ns/op
> Base64Decode.testBase64Decode               4              7  avgt    5      69.400 ±     0.931  ns/op
> Base64Decode.testBase64Decode               4             32  avgt    5      96.818 ±     1.687  ns/op
> Base64Decode.testBase64Decode               4             64  avgt    5     122.856 ±     9.217  ns/op
> Base64Decode.testBase64Decode               4             80  avgt    5     130.935 ±     1.667  ns/op
> Base64Decode.testBase64Decode               4             96  avgt    5     143.627 ±     1.751  ns/op
> Base64Decode.testBase64Decode               4            112  avgt    5     152.311 ±     1.178  ns/op
> Base64Decode.testBase64Decode               4            512  avgt    5     342.631 ±     0.584  ns/op
> Base64Decode.testBase64Decode               4           1000  avgt    5     573.635 ±     1.050  ns/op
> Base64Decode.testBase64Decode               4          20000  avgt    5    9534.136 ±    45.172  ns/op
> Base64Decode.testBase64Decode               4          50000  avgt    5   22718.726 ±   192.070  ns/op
> Base64Decode.testBase64MIMEDecode           4              1  avgt   10      63.558 ±    0.336  ns/op
> Base64Decode.testBase64MIMEDecode           4              3  avgt   10      82.504 ±    0.848  ns/op
> Base64Decode.testBase64MIMEDecode           4              7  avgt   10     120.591 ±    0.608  ns/op
> Base64Decode.testBase64MIMEDecode           4             32  avgt   10     324.314 ±    6.236  ns/op
> Base64Decode.testBase64MIMEDecode           4             64  avgt   10     532.678 ±    4.670  ns/op
> Base64Decode.testBase64MIMEDecode           4             80  avgt   10     678.126 ±    4.324  ns/op
> Base64Decode.testBase64MIMEDecode           4             96  avgt   10     771.603 ±    6.393  ns/op
> Base64Decode.testBase64MIMEDecode           4            112  avgt   10     889.608 ±   0.759  ns/op
> Base64Decode.testBase64MIMEDecode           4            512  avgt   10    3663.557 ±    3.422  ns/op
> Base64Decode.testBase64MIMEDecode           4           1000  avgt   10    7017.784 ±    9.128  ns/op
> Base64Decode.testBase64MIMEDecode           4          20000  avgt   10  128670.660 ± 7951.521  ns/op
> Base64Decode.testBase64MIMEDecode           4          50000  avgt   10  317113.667 ±  161.758  ns/op
> 
> # Kunpeng916, default
> Base64Decode.testBase64Decode               4              1  avgt    5      48.455 ±   0.571  ns/op
> Base64Decode.testBase64Decode               4              3  avgt    5      57.937 ±   0.505  ns/op
> Base64Decode.testBase64Decode               4              7  avgt    5      73.823 ±   1.452  ns/op
> Base64Decode.testBase64Decode               4             32  avgt    5     106.484 ±   1.243  ns/op
> Base64Decode.testBase64Decode               4             64  avgt    5     141.004 ±   1.188  ns/op
> Base64Decode.testBase64Decode               4             80  avgt    5     156.284 ±   0.572  ns/op
> Base64Decode.testBase64Decode               4             96  avgt    5     174.137 ±   0.177  ns/op
> Base64Decode.testBase64Decode               4            112  avgt    5     188.445 ±   0.572  ns/op
> Base64Decode.testBase64Decode               4            512  avgt    5     610.847 ±   1.559  ns/op
> Base64Decode.testBase64Decode               4           1000  avgt    5    1155.368 ±   0.813  ns/op
> Base64Decode.testBase64Decode               4          20000  avgt    5   19751.477 ±  24.669  ns/op
> Base64Decode.testBase64Decode               4          50000  avgt    5   50046.586 ± 523.155  ns/op
> Base64Decode.testBase64MIMEDecode           4              1  avgt   10      64.130 ±   0.238  ns/op
> Base64Decode.testBase64MIMEDecode           4              3  avgt   10      82.096 ±   0.205  ns/op
> Base64Decode.testBase64MIMEDecode           4              7  avgt   10     118.849 ±   0.610  ns/op
> Base64Decode.testBase64MIMEDecode           4             32  avgt   10     331.177 ±   4.732  ns/op
> Base64Decode.testBase64MIMEDecode           4             64  avgt   10     549.117 ±   0.177  ns/op
> Base64Decode.testBase64MIMEDecode           4             80  avgt   10     702.951 ±   4.572  ns/op
> Base64Decode.testBase64MIMEDecode           4             96  avgt   10     799.566 ±   0.301  ns/op
> Base64Decode.testBase64MIMEDecode           4            112  avgt   10     923.749 ±   0.389  ns/op
> Base64Decode.testBase64MIMEDecode           4            512  avgt   10    4000.725 ±   2.519  ns/op
> Base64Decode.testBase64MIMEDecode           4           1000  avgt   10    7674.994 ±   9.281  ns/op
> Base64Decode.testBase64MIMEDecode           4          20000  avgt   10  142059.001 ± 157.920  ns/op
> Base64Decode.testBase64MIMEDecode           4          50000  avgt   10  355698.369 ± 216.542  ns/op

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5624:

> 5622:     __ ld4(in0, in1, in2, in3, arrangement, __ post(src, 4 * size));
> 5623: 
> 5624:     // we need unsigned saturationg substract, to make sure all input values

"saturating subtract"

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5649:

> 5647:     __ orr(decL3, arrangement, decL3, decH3);
> 5648: 
> 5649:     // check iilegal inputs, value larger than 63 (maximum of 6 bits)

"illegal inputs". Are there existing jtreg tests that cover these cases?

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5772:

> 5770:     // The value of index 64 is set to 0, so that we know that we already get the
> 5771:     // decoded data with the 1st lookup.
> 5772:     static const uint8_t fromBase64ForSIMD[128] = {

This table and the one below seem to be identical to first half of the NoSIMD tables. Can't you just use one set of 256-entry tables for both SIMD and non-SIMD algorithms?

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5803:

> 5801:     Register dst   = c_rarg3;  // dest array
> 5802:     Register doff  = c_rarg4;  // position for writing to dest array
> 5803:     Register isURL = c_rarg5;  // Base64 or URL chracter set

"character set"

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5830:

> 5828: 
> 5829:     // The 1st character of the input can be illegal if the data is MIME encoded.
> 5830:     // We can not benefits from SIMD for this case. The max line size of MIME

"cannot benefit"

-------------

PR: https://git.openjdk.java.net/jdk/pull/3228


More information about the core-libs-dev mailing list