RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26]

Suchismith Roy sroy at openjdk.org
Tue Feb 25 16:44:05 UTC 2025


On Fri, 21 Feb 2025 19:54:07 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Hi @TheRealMDoerr  Maybe my answer was not clear. I am not proposing to remove them. I am unable to decipher how to  reduce the 3 instructions to one, as I feel the below 2 lines are required , as per the algorithm.
>>  __ vec_perm(vTmp4, vHigh, vHigh, loadOrder);
>>  __ vec_perm(vTmp5, vLow, vLow, loadOrder);
>
> The purpose of the 3 `vec_perm` instructions is to extract 16 Bytes from two 16 Byte values loaded into vector registers. This can be done by 1 `vec_perm` instruction. But I think AIX should get fixed first before we figure out how to determine the vPerm value for that (probably lvsl + vxor before the loop).

@TheRealMDoerr 
I understood the failure on AIX. It is related to this. 

vec_perm(vH, vTmp5, vTmp4, vPerm)- Here we combine first and last 16 bytes and extract 16 bytes out of them using the pattern generated by lvsl in vPerm. 

We required the 2 extra vec_perm,specifically, for Linux on Power , so that order of elements is retained, else we will end up selecting the wrong 16bytes . 

For Linux we need  vec_perm(vH, vTmp5, vTmp4, vPerm); ...for AIX it would be vec_perm(vH, vTmp4, vTmp5, vPerm); without the need for the 2 vec_perm statements, as the order is retained due to Endianness.

I am trying to find a pattern that can eliminate the need to do 2 extra vec_perm for Linux on Power. 

One thing I tried was 
__ xxspltib(vTmp12->to_vsr(), 31);
__ vxor(vPerm, vPerm, vTmp12);
This generates the sequence of bytes ,required for Little Endian.
 Some test cases did pass, but some failed too. Still working on it. Let me know your inputs too. 
 
 If the above explanation is not clear, let me know, I will try to explain with an example

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1970157770


More information about the hotspot-dev mailing list