RFR: 8269404: Base64 Encoding optimization enhancements for x86 using AVX-512 [v4]

Jatin Bhateja jbhateja at openjdk.java.net
Sun Jul 11 09:06:02 UTC 2021


On Sat, 10 Jul 2021 18:02:16 GMT, Scott Gibbons <github.com+6704669+asgibbons at openjdk.org> wrote:

>> Enhance the Base64 Encode intrinsic for x86 using AVX-512 to get better performance. Also allow for performance improvement on non-AVX-512 enabled platforms.
>> 
>> Added AVX-512 code for encoding Base64 blocks, including slight improvements for non-AVX-512 x86 platforms.
>> 
>> Running the Base64Decode benchmark, this change increases decode performance by an average of 2.6x with a maximum 9.7x for buffers > ~20k.  The numbers are given in the table below.
>> 
>> **Base Score** is without intrinsic support, **Optimized Score** is using this intrinsic, and **Gain** is **Base** / **Optimized**.
>> 
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Encode size 1 | 8.10 | 8.04 | 1.01
>> testBase64Encode size 2 | 8.51 | 8.43 | 1.01
>> testBase64Encode size 3 | 11.08 | 10.72 | 1.03
>> testBase64Encode size 6 | 13.98 | 13.12 | 1.07
>> testBase64Encode size 7 | 14.44 | 13.38 | 1.08
>> testBase64Encode size 9 | 15.44 | 14.37 | 1.07
>> testBase64Encode size 10 | 16.13 | 14.97 | 1.08
>> testBase64Encode size 48 | 27.14 | 23.23 | 1.17
>> testBase64Encode size 512 | 123.86 | 30.75 | 4.03
>> testBase64Encode size 1000 | 224.42 | 37.71 | 5.95
>> testBase64Encode size 20000 | 4202.11 | 430.16 | 9.77
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Addressing review comments.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5432:

> 5430:       __ addptr(encode_table, isURL);
> 5431:       __ shrl(isURL, 6); // restore isURL
> 5432: 

Just a suggestion to save redundant addptr and shr/l in case isUrl is zero. 

lea (encode_table , ExternalAddress(...));
cmp isUrl $0;
cmovne(encode_table, Address(encode_table, 64));

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5463:

> 5461:     if (VM_Version::supports_avx2()
> 5462:         && VM_Version::supports_avx512vlbw()) {
> 5463:       /*

Its not clear why are you checking for existence of AVX512VLBW if this block is emitting AVX2 code.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5480:

> 5478:       __ movl(rax, 0x0fc0fc00);
> 5479:       __ vmovdqu(xmm1, ExternalAddress(StubRoutines::x86::base64_avx2_input_mask_addr()), rax);
> 5480:       __ evpbroadcastd(xmm8, rax, Assembler::AVX_256bit);

evpbroadcastd instruction asserts over VM_Version::supports_evex(),  even though xmm1 is being used and vector length is 256bit this is a perfect case for EVEX->VEX demotion. 
vpbroadcastd could be used in its place.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5487:

> 5485: 
> 5486:       __ subl(length, 24);
> 5487:       __ evpbroadcastd(xmm7, rax, Assembler::AVX_256bit);

Same as above, all the other occurrences in this block can be replaced by vpboradcastd.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4601


More information about the hotspot-compiler-dev mailing list