RFR: 8341052: SHA-512 implementation using SHA-NI [v3]

Thu Oct 10 23:16:11 UTC 2024

On Thu, 10 Oct 2024 18:49:38 GMT, Smita Kamath <svkamath at openjdk.org> wrote:

>> src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1602:
>> 
>>> 1600:       vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23]
>>> 1601:       vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17]
>>> 1602:       vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17]
>> 
>> I assume [Algorithm](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm) is specifically crafted for 256 bit vectors and with 512 bit extension we modify it. Do you think we should factor out following pattern and add an alternative implementation for it ?
>> 
>>   ```
>>       vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23]
>>       vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17]
>>       vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17]
>> 
>> 
>> This is a fixed pattern seen 4 times within computation loop and once outside the loop.
>> We are permuting two vectors with constant paramutation mask and blending them using immediate mask.
>> This is a very valid use case for two table permutation instruction VPERMI2Q (available for AVX512VL targets) 
>> We can store permutation pattern outside the loop into a vector and then re-use it within the loop.
>
> We can do this change in a separate PR.

I agree with Smita. The current implementation has a one-to-one correspondence with the ipsec implementation. Any new changes or refactoring will require a new round of exhaustive testing and could be implemented as a separate PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1796204440