RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v8]
Jatin Bhateja
jbhateja at openjdk.org
Mon Feb 6 06:39:51 UTC 2023
On Wed, 1 Feb 2023 19:07:17 GMT, Scott Gibbons <duke at openjdk.org> wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate ~4x for AVX2 platforms.
>>
>> Encode performance:
>> **Old:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error Units
>> Base64Encode.testBase64Encode 1024 thrpt 3 4309.439 ± 2.632 ops/ms
>>
>>
>> **New:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error Units
>> Base64Encode.testBase64Encode 1024 thrpt 3 24211.397 ± 102.026 ops/ms
>>
>>
>> Decode performance:
>> **Old:**
>>
>> Benchmark (errorIndex) (lineSize) (maxNumBytes) Mode Cnt Score Error Units
>> Base64Decode.testBase64Decode 144 4 1024 thrpt 3 3961.768 ± 93.409 ops/ms
>>
>> **New:**
>> Benchmark (errorIndex) (lineSize) (maxNumBytes) Mode Cnt Score Error Units
>> Base64Decode.testBase64Decode 144 4 1024 thrpt 3 14738.051 ± 24.383 ops/ms
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
>
> Change break-even buffer size for AVX512
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2693:
> 2691: __ vpshufb(xmm0, xmm0, xmm13, Assembler::AVX_256bit);
> 2692: __ vpermd(xmm0, xmm12, xmm0, Assembler::AVX_256bit);
> 2693: __ subl(length, 0x20);
Subtraction effects EFLAGs we can save one redundant compare per iteration on [#L2697](https://github.com/openjdk/jdk/pull/12126/files#diff-b938ab8a7bd9f57eb02271e2dd24a305bca30f06e9f8b028e18a139c4908ec92R2697)
by doing a prior subtraction by 0x2c (44) in pre-loop and increment by same amount post loop.
Same goes out for encode main loop also.
-------------
PR: https://git.openjdk.org/jdk/pull/12126
More information about the core-libs-dev
mailing list