RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

Sandhya Viswanathan sviswanathan at openjdk.org
Tue Feb 14 02:00:58 UTC 2023


On Thu, 9 Feb 2023 18:08:15 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> Added code for Base64 acceleration (encode and decode) which will accelerate ~4x for AVX2 platforms.
>> 
>> Encode performance:
>> **Old:**
>> 
>> Benchmark                      (maxNumBytes)   Mode  Cnt     Score   Error   Units
>> Base64Encode.testBase64Encode           1024  thrpt    3  4309.439 ± 2.632  ops/ms
>> 
>> 
>> **New:**
>> 
>> Benchmark                      (maxNumBytes)   Mode  Cnt      Score     Error   Units
>> Base64Encode.testBase64Encode           1024  thrpt    3  24211.397 ± 102.026  ops/ms
>> 
>> 
>> Decode performance:
>> **Old:**
>> 
>> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   Mode  Cnt     Score    Error   Units
>> Base64Decode.testBase64Decode           144           4           1024  thrpt    3  3961.768 ± 93.409  ops/ms
>> 
>> **New:**
>> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   Mode  Cnt      Score    Error   Units
>> Base64Decode.testBase64Decode           144           4           1024  thrpt    3  14738.051 ± 24.383  ops/ms
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add URL to microbenchmark

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2399:

> 2397:      VM_Version::supports_avx512bw()) {
> 2398:     __ cmpl(length, 31);     // 32-bytes is break-even for AVX-512
> 2399:     __ jcc(Assembler::lessEqual, L_bruteForce);

The avx2 code needs the length to be atleast 0x2c (44) bytes. We could directly go to non-avx code instead of L_bruteForce here. We will save one subtract/branch.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2658:

> 2656:     // Check for buffer too small (for algorithm)
> 2657:     __ subl(length, 0x2c);
> 2658:     __ jcc(Assembler::lessEqual, L_tailProc);

This could be Assembler::less instead of Assembler::lessEqual.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2699:

> 2697:     __ addptr(dest, 0x18);
> 2698:     __ subl(length, 0x20);
> 2699:     __ jcc(Assembler::lessEqual, L_tailProc);

This could be Assembler::less instead of Assembler::lessEqual.

-------------

PR: https://git.openjdk.org/jdk/pull/12126


More information about the hotspot-compiler-dev mailing list