RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v8]

Mon Feb 6 06:39:51 UTC 2023

On Wed, 1 Feb 2023 19:07:17 GMT, Scott Gibbons <duke at openjdk.org> wrote:

>> Added code for Base64 acceleration (encode and decode) which will accelerate ~4x for AVX2 platforms.
>> 
>> Encode performance:
>> **Old:**
>> 
>> Benchmark                      (maxNumBytes)   Mode  Cnt     Score   Error   Units
>> Base64Encode.testBase64Encode           1024  thrpt    3  4309.439 ± 2.632  ops/ms
>> 
>> 
>> **New:**
>> 
>> Benchmark                      (maxNumBytes)   Mode  Cnt      Score     Error   Units
>> Base64Encode.testBase64Encode           1024  thrpt    3  24211.397 ± 102.026  ops/ms
>> 
>> 
>> Decode performance:
>> **Old:**
>> 
>> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   Mode  Cnt     Score    Error   Units
>> Base64Decode.testBase64Decode           144           4           1024  thrpt    3  3961.768 ± 93.409  ops/ms
>> 
>> **New:**
>> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   Mode  Cnt      Score    Error   Units
>> Base64Decode.testBase64Decode           144           4           1024  thrpt    3  14738.051 ± 24.383  ops/ms
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Change break-even buffer size for AVX512

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2693:

> 2691:     __ vpshufb(xmm0, xmm0, xmm13, Assembler::AVX_256bit);
> 2692:     __ vpermd(xmm0, xmm12, xmm0, Assembler::AVX_256bit);
> 2693:     __ subl(length, 0x20);

Subtraction effects EFLAGs we can save one redundant compare per iteration on [#L2697](https://github.com/openjdk/jdk/pull/12126/files#diff-b938ab8a7bd9f57eb02271e2dd24a305bca30f06e9f8b028e18a139c4908ec92R2697)
by doing a prior subtraction by 0x2c (44) in pre-loop and increment by same amount post loop.

Same goes out for encode main loop also.

-------------

PR: https://git.openjdk.org/jdk/pull/12126