RFR: 8256245: AArch64: Implement Base64 decoding intrinsic
Dong Bo
dongbo at openjdk.java.net
Tue Apr 6 08:04:12 UTC 2021
On Fri, 2 Apr 2021 10:17:57 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> PING... Any suggestions on the updated commit?
>
>> PING... Any suggestions on the updated commit?
>
> Once you reply to the comments, sure.
>
> Are there any existing test cases for failing inputs?
>
I added one, the error character is injected at the paramized index of the encoded data.
There are no big differences for small error injected index, seems too much time is took by exception handing.
Witnessed ~2x performance improvements as expected. The JMH tests:
### Kunpeng 916, intrinsic,tested with `-jar benchmarks.jar testBase64WithErrorInputsDecode -p errorIndex=3,64,144,208,272,1000,20000 -p maxNumBytes=1`
Base64Decode.testBase64WithErrorInputsDecode 3 4 1 avgt 10 3696.151 ± 202.783 ns/op
Base64Decode.testBase64WithErrorInputsDecode 64 4 1 avgt 10 3899.269 ± 178.289 ns/op
Base64Decode.testBase64WithErrorInputsDecode 144 4 1 avgt 10 3902.022 ± 163.611 ns/op
Base64Decode.testBase64WithErrorInputsDecode 208 4 1 avgt 10 3982.423 ± 256.638 ns/op
Base64Decode.testBase64WithErrorInputsDecode 272 4 1 avgt 10 3984.545 ± 144.282 ns/op
Base64Decode.testBase64WithErrorInputsDecode 1000 4 1 avgt 10 4532.959 ± 310.068 ns/op
Base64Decode.testBase64WithErrorInputsDecode 20000 4 1 avgt 10 17578.148 ± 631.600 ns/op
### Kunpeng 916, default,tested with `-XX:-UseBASE64Intrinsics -jar benchmarks.jar testBase64WithErrorInputsDecode -p errorIndex=3,64,144,208,272,1000,20000 -p maxNumBytes=1`
Base64Decode.testBase64WithErrorInputsDecode 3 4 1 avgt 10 3760.330 ± 261.672 ns/op
Base64Decode.testBase64WithErrorInputsDecode 64 4 1 avgt 10 3900.326 ± 121.632 ns/op
Base64Decode.testBase64WithErrorInputsDecode 144 4 1 avgt 10 4041.428 ± 174.435 ns/op
Base64Decode.testBase64WithErrorInputsDecode 208 4 1 avgt 10 4177.670 ± 214.433 ns/op
Base64Decode.testBase64WithErrorInputsDecode 272 4 1 avgt 10 4324.020 ± 106.826 ns/op
Base64Decode.testBase64WithErrorInputsDecode 1000 4 1 avgt 10 5476.469 ± 171.647 ns/op
Base64Decode.testBase64WithErrorInputsDecode 20000 4 1 avgt 10 34163.743 ± 162.263 ns/op
>
> Your test results suggest that it isn't useful for that, surely?
>
The results suggest non-SIMD code provides ~11.9% improvements for MIME decoding.
Furthermore, according to local tests, we may have about ~30% performance regression for MIME decoding without non-SIMD code.
In worst case, a MIME line has only 4 base64 encoded characters and a newline string consisted of error inputs, e.g. `\r\n`.
When the instrinsic encounter an illegal character (`\r`), it has to exit.
Then the Java code will pass the next illegal source byte (`\n`) to the intrinsic.
With only SIMD code, it will execute too much wasty instructions before it can detect the error.
Whie with non-SIMD code, the instrinsic will execute only one non-SIMD round for this error input.
>
> For loads and four post increments rather than one load and a few BFMs? Why?
>
Nice suggestion. Done, thanks.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3228
More information about the hotspot-compiler-dev
mailing list