RFR: 8269404: Base64 Encoding optimization enhancements for x86 using AVX-512 [v3]
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Sat Jul 10 00:40:51 UTC 2021
On Mon, 28 Jun 2021 15:18:34 GMT, Scott Gibbons <github.com+6704669+asgibbons at openjdk.org> wrote:
>> Enhance the Base64 Encode intrinsic for x86 using AVX-512 to get better performance. Also allow for performance improvement on non-AVX-512 enabled platforms.
>>
>> Added AVX-512 code for encoding Base64 blocks, including slight improvements for non-AVX-512 x86 platforms.
>>
>> Running the Base64Decode benchmark, this change increases decode performance by an average of 2.6x with a maximum 9.7x for buffers > ~20k. The numbers are given in the table below.
>>
>> **Base Score** is without intrinsic support, **Optimized Score** is using this intrinsic, and **Gain** is **Base** / **Optimized**.
>>
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Encode size 1 | 8.10 | 8.04 | 1.01
>> testBase64Encode size 2 | 8.51 | 8.43 | 1.01
>> testBase64Encode size 3 | 11.08 | 10.72 | 1.03
>> testBase64Encode size 6 | 13.98 | 13.12 | 1.07
>> testBase64Encode size 7 | 14.44 | 13.38 | 1.08
>> testBase64Encode size 9 | 15.44 | 14.37 | 1.07
>> testBase64Encode size 10 | 16.13 | 14.97 | 1.08
>> testBase64Encode size 48 | 27.14 | 23.23 | 1.17
>> testBase64Encode size 512 | 123.86 | 30.75 | 4.03
>> testBase64Encode size 1000 | 224.42 | 37.71 | 5.95
>> testBase64Encode size 20000 | 4202.11 | 430.16 | 9.77
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
>
> Wrong shift count for URL encoding.
src/hotspot/cpu/x86/assembler_x86.cpp line 4158:
> 4156: assert((VM_Version::supports_avx2() && vector_len == AVX_256bit), "");
> 4157: InstructionMark im(this);
> 4158: InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ false, /* uses_vl */ true);
legacy_mode should be true here as this instruction is not supported by AVX512.
src/hotspot/cpu/x86/assembler_x86.cpp line 6693:
> 6691: void Assembler::vpmulhuw(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 6692: assert((vector_len == AVX_128bit && VM_Version::supports_avx()) ||
> 6693: (vector_len == AVX_256bit && VM_Version::supports_avx2()), "");
If vector_len == AVX_512bit need to check for VM_Version::supports_avx512bw()
src/hotspot/cpu/x86/assembler_x86.cpp line 6694:
> 6692: assert((vector_len == AVX_128bit && VM_Version::supports_avx()) ||
> 6693: (vector_len == AVX_256bit && VM_Version::supports_avx2()), "");
> 6694: InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true);
legacy_mode should be _legacy_mode_bw here.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5434:
> 5432: __ shrl(isURL, 6); // restore isURL
> 5433:
> 5434: __ mov64(rax, 0x3036242a1016040a); // Shifts
Does the constant 0x3036242a1016040a need l or ul suffix?
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5477:
> 5475:
> 5476: // Set up supporting constant table data
> 5477: __ vmovdqu(xmm9, ExternalAddress(StubRoutines::x86::base64_avx2_shuffle_addr()));
Need to pass the scratch_reg here.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5480:
> 5478: // 6-bit mask for 2nd and 4th (and multiples) 6-bit values
> 5479: __ movl(rax, 0x0fc0fc00);
> 5480: __ vmovdqu(xmm1, ExternalAddress(StubRoutines::x86::base64_avx2_input_mask_addr()));
Need to pass scratch_reg here.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5672:
> 5670: __ jcc(Assembler::above, L_32byteLoop);
> 5671:
> 5672: __ vzeroupper();
vzeroupper is not happening in all cases. There are exits to L_process3.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4601
More information about the hotspot-compiler-dev
mailing list