RFR: 8341052: SHA-512 implementation using SHA-NI [v3]
Smita Kamath
svkamath at openjdk.org
Thu Oct 10 18:52:31 UTC 2024
On Thu, 10 Oct 2024 11:52:36 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Addressed a review comment
>
> src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1602:
>
>> 1600: vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23]
>> 1601: vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17]
>> 1602: vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17]
>
> I assume [Algorithm](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm) is specifically crafted for 256 bit vectors and with 512 bit extension we modify it. Do you think we should factor out following pattern and add an alternative implementation for it ?
>
> ```
> vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23]
> vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17]
> vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17]
>
>
> This is a fixed pattern seen 4 times within computation loop and once outside the loop.
> We are permuting two vectors with constant paramutation mask and blending them using immediate mask.
> This is a very valid use case for two table permutation instruction VPERMI2Q (available for AVX512VL targets)
> We can store permutation pattern outside the loop into a vector and then re-use it within the loop.
We can do this change in a separate PR.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1795938470
More information about the hotspot-compiler-dev
mailing list