RFR: 8269404: Base64 Encoding optimization enhancements for x86 using AVX-512 [v4]
Jatin Bhateja
jbhateja at openjdk.java.net
Sun Jul 11 09:06:02 UTC 2021
On Sat, 10 Jul 2021 18:02:16 GMT, Scott Gibbons <github.com+6704669+asgibbons at openjdk.org> wrote:
>> Enhance the Base64 Encode intrinsic for x86 using AVX-512 to get better performance. Also allow for performance improvement on non-AVX-512 enabled platforms.
>>
>> Added AVX-512 code for encoding Base64 blocks, including slight improvements for non-AVX-512 x86 platforms.
>>
>> Running the Base64Decode benchmark, this change increases decode performance by an average of 2.6x with a maximum 9.7x for buffers > ~20k. The numbers are given in the table below.
>>
>> **Base Score** is without intrinsic support, **Optimized Score** is using this intrinsic, and **Gain** is **Base** / **Optimized**.
>>
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Encode size 1 | 8.10 | 8.04 | 1.01
>> testBase64Encode size 2 | 8.51 | 8.43 | 1.01
>> testBase64Encode size 3 | 11.08 | 10.72 | 1.03
>> testBase64Encode size 6 | 13.98 | 13.12 | 1.07
>> testBase64Encode size 7 | 14.44 | 13.38 | 1.08
>> testBase64Encode size 9 | 15.44 | 14.37 | 1.07
>> testBase64Encode size 10 | 16.13 | 14.97 | 1.08
>> testBase64Encode size 48 | 27.14 | 23.23 | 1.17
>> testBase64Encode size 512 | 123.86 | 30.75 | 4.03
>> testBase64Encode size 1000 | 224.42 | 37.71 | 5.95
>> testBase64Encode size 20000 | 4202.11 | 430.16 | 9.77
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
>
> Addressing review comments.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5432:
> 5430: __ addptr(encode_table, isURL);
> 5431: __ shrl(isURL, 6); // restore isURL
> 5432:
Just a suggestion to save redundant addptr and shr/l in case isUrl is zero.
lea (encode_table , ExternalAddress(...));
cmp isUrl $0;
cmovne(encode_table, Address(encode_table, 64));
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5463:
> 5461: if (VM_Version::supports_avx2()
> 5462: && VM_Version::supports_avx512vlbw()) {
> 5463: /*
Its not clear why are you checking for existence of AVX512VLBW if this block is emitting AVX2 code.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5480:
> 5478: __ movl(rax, 0x0fc0fc00);
> 5479: __ vmovdqu(xmm1, ExternalAddress(StubRoutines::x86::base64_avx2_input_mask_addr()), rax);
> 5480: __ evpbroadcastd(xmm8, rax, Assembler::AVX_256bit);
evpbroadcastd instruction asserts over VM_Version::supports_evex(), even though xmm1 is being used and vector length is 256bit this is a perfect case for EVEX->VEX demotion.
vpbroadcastd could be used in its place.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5487:
> 5485:
> 5486: __ subl(length, 24);
> 5487: __ evpbroadcastd(xmm7, rax, Assembler::AVX_256bit);
Same as above, all the other occurrences in this block can be replaced by vpboradcastd.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4601
More information about the hotspot-compiler-dev
mailing list