RFR: 8325991: Accelerate Poly1305 on x86_64 using AVX2 instructions [v10]

Fri Mar 1 06:01:45 UTC 2024

On Tue, 27 Feb 2024 21:13:07 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:

>> The goal of this PR is to accelerate the Poly1305 algorithm using AVX2 instructions (including IFMA) for x86_64 CPUs.
>> 
>> This implementation is directly based on the AVX2 Poly1305 hash computation as implemented in Intel(R) Multi-Buffer Crypto for IPsec Library (url: https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t3/poly_fma_avx2.asm)
>> 
>> This PR shows upto 19x speedup on buffer sizes of 1MB.
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update description of Poly1305 algo

src/hotspot/cpu/x86/assembler_x86.cpp line 5146:

> 5144: 
> 5145: void Assembler::vpmadd52luq(XMMRegister dst, XMMRegister src1, Address src2, int vector_len) {
> 5146:   assert(vector_len == AVX_512bit ? VM_Version::supports_avx512ifma() : VM_Version::supports_avxifma(), "");

What if vector length is 128 bit and target does not support AVX_IFMA ? AVX512_IFMA + AVX512_VL should still be still be sufficient to execute 52 bit MACs.

src/hotspot/cpu/x86/assembler_x86.cpp line 5181:

> 5179: 
> 5180: void Assembler::vpmadd52huq(XMMRegister dst, XMMRegister src1, Address src2, int vector_len) {
> 5181:   assert(vector_len == AVX_512bit ? VM_Version::supports_avx512ifma() : VM_Version::supports_avxifma(), "");

What if vector length is 128 bit  and target does not support AVX_IFMA ?  AVX512_IFMA + AVX512_VL should still be still be sufficient to execute 52 bit MACs.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1508515255
PR Review Comment: https://git.openjdk.org/jdk/pull/17881#discussion_r1508514777