RFR: 8269404: Base64 Encoding optimization enhancements for x86 using AVX-512 [v3]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Sat Jul 10 00:40:51 UTC 2021


On Mon, 28 Jun 2021 15:18:34 GMT, Scott Gibbons <github.com+6704669+asgibbons at openjdk.org> wrote:

>> Enhance the Base64 Encode intrinsic for x86 using AVX-512 to get better performance. Also allow for performance improvement on non-AVX-512 enabled platforms.
>> 
>> Added AVX-512 code for encoding Base64 blocks, including slight improvements for non-AVX-512 x86 platforms.
>> 
>> Running the Base64Decode benchmark, this change increases decode performance by an average of 2.6x with a maximum 9.7x for buffers > ~20k.  The numbers are given in the table below.
>> 
>> **Base Score** is without intrinsic support, **Optimized Score** is using this intrinsic, and **Gain** is **Base** / **Optimized**.
>> 
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Encode size 1 | 8.10 | 8.04 | 1.01
>> testBase64Encode size 2 | 8.51 | 8.43 | 1.01
>> testBase64Encode size 3 | 11.08 | 10.72 | 1.03
>> testBase64Encode size 6 | 13.98 | 13.12 | 1.07
>> testBase64Encode size 7 | 14.44 | 13.38 | 1.08
>> testBase64Encode size 9 | 15.44 | 14.37 | 1.07
>> testBase64Encode size 10 | 16.13 | 14.97 | 1.08
>> testBase64Encode size 48 | 27.14 | 23.23 | 1.17
>> testBase64Encode size 512 | 123.86 | 30.75 | 4.03
>> testBase64Encode size 1000 | 224.42 | 37.71 | 5.95
>> testBase64Encode size 20000 | 4202.11 | 430.16 | 9.77
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Wrong shift count for URL encoding.

src/hotspot/cpu/x86/assembler_x86.cpp line 4158:

> 4156:   assert((VM_Version::supports_avx2() && vector_len == AVX_256bit), "");
> 4157:   InstructionMark im(this);
> 4158:   InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ false, /* uses_vl */ true);

legacy_mode should be true here as this instruction is not supported by AVX512.

src/hotspot/cpu/x86/assembler_x86.cpp line 6693:

> 6691: void Assembler::vpmulhuw(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 6692:   assert((vector_len == AVX_128bit && VM_Version::supports_avx()) ||
> 6693:          (vector_len == AVX_256bit && VM_Version::supports_avx2()), "");

If vector_len == AVX_512bit need to check for VM_Version::supports_avx512bw()

src/hotspot/cpu/x86/assembler_x86.cpp line 6694:

> 6692:   assert((vector_len == AVX_128bit && VM_Version::supports_avx()) ||
> 6693:          (vector_len == AVX_256bit && VM_Version::supports_avx2()), "");
> 6694:   InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true);

legacy_mode should be _legacy_mode_bw here.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5434:

> 5432:       __ shrl(isURL, 6); // restore isURL
> 5433: 
> 5434:       __ mov64(rax, 0x3036242a1016040a); // Shifts

Does the constant 0x3036242a1016040a need l or ul suffix?

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5477:

> 5475: 
> 5476:       // Set up supporting constant table data
> 5477:       __ vmovdqu(xmm9, ExternalAddress(StubRoutines::x86::base64_avx2_shuffle_addr()));

Need to pass the scratch_reg here.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5480:

> 5478:       // 6-bit mask for 2nd and 4th (and multiples) 6-bit values
> 5479:       __ movl(rax, 0x0fc0fc00);
> 5480:       __ vmovdqu(xmm1, ExternalAddress(StubRoutines::x86::base64_avx2_input_mask_addr()));

Need to pass scratch_reg here.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5672:

> 5670:       __ jcc(Assembler::above, L_32byteLoop);
> 5671: 
> 5672:       __ vzeroupper();

vzeroupper is not happening in all cases. There are exits to  L_process3.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4601


More information about the hotspot-compiler-dev mailing list