RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26]

Suchismith Roy sroy at openjdk.org
Wed Feb 26 12:26:02 UTC 2025


On Tue, 25 Feb 2025 16:49:03 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> @TheRealMDoerr 
>> I understood the failure on AIX. It is related to this. 
>> 
>> vec_perm(vH, vTmp5, vTmp4, vPerm)- Here we combine first and last 16 bytes and extract 16 bytes out of them using the pattern generated by lvsl in vPerm. 
>> 
>> We required the 2 extra vec_perm,specifically, for Linux on Power , so that order of elements is retained, else we will end up selecting the wrong 16bytes . 
>> 
>> For Linux we need  vec_perm(vH, vTmp5, vTmp4, vPerm); ...for AIX it would be vec_perm(vH, vTmp4, vTmp5, vPerm); without the need for the 2 vec_perm statements, as the order is retained due to Endianness.
>> 
>> I am trying to find a pattern that can eliminate the need to do 2 extra vec_perm for Linux on Power. 
>> 
>> One thing I tried was 
>> __ xxspltib(vTmp12->to_vsr(), 31);
>> __ vxor(vPerm, vPerm, vTmp12);
>> This generates the sequence of bytes ,required for Little Endian.
>>  Some test cases did pass, but some failed too. Still working on it. Let me know your inputs too. 
>>  
>>  If the above explanation is not clear, let me know, I will try to explain with an example
>
> I'll wait for the AIX fix and make experiments on both platforms after that.

@TheRealMDoerr  I was able to fix this and find a the pattern to eliminate need for 2 vec_perm instructions.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1971493452


More information about the hotspot-dev mailing list