RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26]

Martin Doerr mdoerr at openjdk.org
Tue Feb 25 16:52:01 UTC 2025


On Tue, 25 Feb 2025 16:41:30 GMT, Suchismith Roy <sroy at openjdk.org> wrote:

>> The purpose of the 3 `vec_perm` instructions is to extract 16 Bytes from two 16 Byte values loaded into vector registers. This can be done by 1 `vec_perm` instruction. But I think AIX should get fixed first before we figure out how to determine the vPerm value for that (probably lvsl + vxor before the loop).
>
> @TheRealMDoerr 
> I understood the failure on AIX. It is related to this. 
> 
> vec_perm(vH, vTmp5, vTmp4, vPerm)- Here we combine first and last 16 bytes and extract 16 bytes out of them using the pattern generated by lvsl in vPerm. 
> 
> We required the 2 extra vec_perm,specifically, for Linux on Power , so that order of elements is retained, else we will end up selecting the wrong 16bytes . 
> 
> For Linux we need  vec_perm(vH, vTmp5, vTmp4, vPerm); ...for AIX it would be vec_perm(vH, vTmp4, vTmp5, vPerm); without the need for the 2 vec_perm statements, as the order is retained due to Endianness.
> 
> I am trying to find a pattern that can eliminate the need to do 2 extra vec_perm for Linux on Power. 
> 
> One thing I tried was 
> __ xxspltib(vTmp12->to_vsr(), 31);
> __ vxor(vPerm, vPerm, vTmp12);
> This generates the sequence of bytes ,required for Little Endian.
>  Some test cases did pass, but some failed too. Still working on it. Let me know your inputs too. 
>  
>  If the above explanation is not clear, let me know, I will try to explain with an example

I'll wait for the AIX fix and make experiments on both platforms after that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1970171149


More information about the hotspot-dev mailing list