RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

Tue Apr 6 08:04:12 UTC 2021

On Fri, 2 Apr 2021 10:17:57 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> PING... Any suggestions on the updated commit?
>
>> PING... Any suggestions on the updated commit?
> 
> Once you reply to the comments, sure.

>
> Are there any existing test cases for failing inputs?
>
I added one, the error character is injected at the paramized index of the encoded data.
There are no big differences for small error injected index, seems too much time is took by exception handing.
Witnessed ~2x performance improvements as expected. The JMH tests:
### Kunpeng 916, intrinsic，tested with `-jar benchmarks.jar testBase64WithErrorInputsDecode -p errorIndex=3,64,144,208,272,1000,20000 -p maxNumBytes=1`
Base64Decode.testBase64WithErrorInputsDecode             3           4              1  avgt   10   3696.151 ± 202.783  ns/op
Base64Decode.testBase64WithErrorInputsDecode            64           4              1  avgt   10   3899.269 ± 178.289  ns/op
Base64Decode.testBase64WithErrorInputsDecode           144           4              1  avgt   10   3902.022 ± 163.611  ns/op
Base64Decode.testBase64WithErrorInputsDecode           208           4              1  avgt   10   3982.423 ± 256.638  ns/op
Base64Decode.testBase64WithErrorInputsDecode           272           4              1  avgt   10   3984.545 ± 144.282  ns/op
Base64Decode.testBase64WithErrorInputsDecode          1000           4              1  avgt   10   4532.959 ± 310.068  ns/op
Base64Decode.testBase64WithErrorInputsDecode         20000           4              1  avgt   10  17578.148 ± 631.600  ns/op
### Kunpeng 916, default，tested with `-XX:-UseBASE64Intrinsics -jar benchmarks.jar testBase64WithErrorInputsDecode -p errorIndex=3,64,144,208,272,1000,20000 -p maxNumBytes=1`
Base64Decode.testBase64WithErrorInputsDecode             3           4              1  avgt   10   3760.330 ± 261.672  ns/op
Base64Decode.testBase64WithErrorInputsDecode            64           4              1  avgt   10   3900.326 ± 121.632  ns/op
Base64Decode.testBase64WithErrorInputsDecode           144           4              1  avgt   10   4041.428 ± 174.435  ns/op
Base64Decode.testBase64WithErrorInputsDecode           208           4              1  avgt   10   4177.670 ± 214.433  ns/op
Base64Decode.testBase64WithErrorInputsDecode           272           4              1  avgt   10   4324.020 ± 106.826  ns/op
Base64Decode.testBase64WithErrorInputsDecode          1000           4              1  avgt   10   5476.469 ± 171.647  ns/op
Base64Decode.testBase64WithErrorInputsDecode         20000           4              1  avgt   10  34163.743 ± 162.263  ns/op

>
> Your test results suggest that it isn't useful for that, surely?
>
The results suggest non-SIMD code provides ~11.9% improvements for MIME decoding.
Furthermore, according to local tests, we may have about ~30% performance regression for MIME decoding without non-SIMD code.

In worst case, a MIME line has only 4 base64 encoded characters and a newline string consisted of error inputs, e.g. `\r\n`.
When the instrinsic encounter an illegal character (`\r`), it has to exit.
Then the Java code will pass the next illegal source byte (`\n`) to the intrinsic.
With only SIMD code, it will execute too much wasty instructions before it can detect the error.
Whie with non-SIMD code, the instrinsic will execute only one non-SIMD round for this error input.

>
> For loads and four post increments rather than one load and a few BFMs? Why?
>
Nice suggestion. Done, thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3228