RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26]
Martin Doerr
mdoerr at openjdk.org
Tue Feb 25 16:52:01 UTC 2025
On Tue, 25 Feb 2025 16:41:30 GMT, Suchismith Roy <sroy at openjdk.org> wrote:
>> The purpose of the 3 `vec_perm` instructions is to extract 16 Bytes from two 16 Byte values loaded into vector registers. This can be done by 1 `vec_perm` instruction. But I think AIX should get fixed first before we figure out how to determine the vPerm value for that (probably lvsl + vxor before the loop).
>
> @TheRealMDoerr
> I understood the failure on AIX. It is related to this.
>
> vec_perm(vH, vTmp5, vTmp4, vPerm)- Here we combine first and last 16 bytes and extract 16 bytes out of them using the pattern generated by lvsl in vPerm.
>
> We required the 2 extra vec_perm,specifically, for Linux on Power , so that order of elements is retained, else we will end up selecting the wrong 16bytes .
>
> For Linux we need vec_perm(vH, vTmp5, vTmp4, vPerm); ...for AIX it would be vec_perm(vH, vTmp4, vTmp5, vPerm); without the need for the 2 vec_perm statements, as the order is retained due to Endianness.
>
> I am trying to find a pattern that can eliminate the need to do 2 extra vec_perm for Linux on Power.
>
> One thing I tried was
> __ xxspltib(vTmp12->to_vsr(), 31);
> __ vxor(vPerm, vPerm, vTmp12);
> This generates the sequence of bytes ,required for Little Endian.
> Some test cases did pass, but some failed too. Still working on it. Let me know your inputs too.
>
> If the above explanation is not clear, let me know, I will try to explain with an example
I'll wait for the AIX fix and make experiments on both platforms after that.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1970171149
More information about the hotspot-dev
mailing list