RFR: 8341052: SHA-512 implementation using SHA-NI [v3]

Smita Kamath svkamath at openjdk.org
Thu Oct 10 18:52:31 UTC 2024


On Thu, 10 Oct 2024 11:52:36 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Addressed a review comment
>
> src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1602:
> 
>> 1600:       vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23]
>> 1601:       vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17]
>> 1602:       vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17]
> 
> I assume [Algorithm](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm) is specifically crafted for 256 bit vectors and with 512 bit extension we modify it. Do you think we should factor out following pattern and add an alternative implementation for it ?
> 
>   ```
>       vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23]
>       vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17]
>       vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17]
> 
> 
> This is a fixed pattern seen 4 times within computation loop and once outside the loop.
> We are permuting two vectors with constant paramutation mask and blending them using immediate mask.
> This is a very valid use case for two table permutation instruction VPERMI2Q (available for AVX512VL targets) 
> We can store permutation pattern outside the loop into a vector and then re-use it within the loop.

We can do this change in a separate PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1795938470


More information about the hotspot-compiler-dev mailing list