RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26]
Suchismith Roy
sroy at openjdk.org
Wed Feb 26 12:26:02 UTC 2025
On Tue, 25 Feb 2025 16:49:03 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:
>> @TheRealMDoerr
>> I understood the failure on AIX. It is related to this.
>>
>> vec_perm(vH, vTmp5, vTmp4, vPerm)- Here we combine first and last 16 bytes and extract 16 bytes out of them using the pattern generated by lvsl in vPerm.
>>
>> We required the 2 extra vec_perm,specifically, for Linux on Power , so that order of elements is retained, else we will end up selecting the wrong 16bytes .
>>
>> For Linux we need vec_perm(vH, vTmp5, vTmp4, vPerm); ...for AIX it would be vec_perm(vH, vTmp4, vTmp5, vPerm); without the need for the 2 vec_perm statements, as the order is retained due to Endianness.
>>
>> I am trying to find a pattern that can eliminate the need to do 2 extra vec_perm for Linux on Power.
>>
>> One thing I tried was
>> __ xxspltib(vTmp12->to_vsr(), 31);
>> __ vxor(vPerm, vPerm, vTmp12);
>> This generates the sequence of bytes ,required for Little Endian.
>> Some test cases did pass, but some failed too. Still working on it. Let me know your inputs too.
>>
>> If the above explanation is not clear, let me know, I will try to explain with an example
>
> I'll wait for the AIX fix and make experiments on both platforms after that.
@TheRealMDoerr I was able to fix this and find a the pattern to eliminate need for 2 vec_perm instructions.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1971493452
More information about the hotspot-dev
mailing list