RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28]

Suchismith Roy sroy at openjdk.org
Fri Mar 7 17:02:01 UTC 2025


On Mon, 3 Mar 2025 10:47:59 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> @TheRealMDoerr can you explain how it can be equivalent to these 4 instructions ? 
>> we are  extracting the different parts of midProduct here ,64 bits each, for the cross product. 
>> I,e Xl * Hh +Hl*Xh , so the below 2 are required 
>> masm->vsldoi(vTmp8, vMidProduct, vZero, 8);                      
>> masm->vsldoi(vTmp9, vZero, vMidProduct, 8);     
>>     
>> 
>> 
>> 
>>>
> Your version extracts 2 8 Byte parts and feeds them into separate xor instructions. My proposal performs both 8 Byte xor operations with one vxor instruction by selecting the input bits accordingly. It furthermore avoids swapping halves forth and back (I swap the halves of vReducedLow instead).
> Have you tried?

@TheRealMDoerr  Yes. The tests do not pass with this. 
Trying to find a scope to reduce instructions. 
masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8);           // Swap
    masm->vxor(vLowProduct, vLowProduct, vReducedLow);                // Reduction using constant
    masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8);       // Swap 
   
   
   can be brought down to 2 instructions. 
   Still looking for scope to reduce. Let me know your inputs

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1985402217


More information about the hotspot-dev mailing list