RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28]

Martin Doerr mdoerr at openjdk.org
Mon Mar 3 10:50:56 UTC 2025


On Sun, 2 Mar 2025 17:07:11 GMT, Suchismith Roy <sroy at openjdk.org> wrote:

>> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 574:
>> 
>>> 572:     masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8);           // Swap
>>> 573:     masm->vxor(vLowProduct, vLowProduct, vReducedLow);                // Reduction using constant
>>> 574:     masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8);       // Swap
>> 
>> The part between the vpsumd instructions looks too complicated. Isn't it equivalent to the following?
>> 
>>     masm->vsldoi(vTmp8, vLowProduct, vHighProduct, 8);
>>     masm->vsldoi(vTmp9, vReducedLow, vReducedLow, 8);
>>     masm->vxor(vTmp8, vTmp8, vMidProduct);
>>     masm->vxor(vCombinedResult, vTmp8, vTmp9);
>
> @TheRealMDoerr can you explain how it can be equivalent to these 4 instructions ? 
> we are  extracting the different parts of midProduct here ,64 bits each, for the cross product. 
> I,e Xl * Hh +Hl*Xh , so the below 2 are required 
> masm->vsldoi(vTmp8, vMidProduct, vZero, 8);                      
> masm->vsldoi(vTmp9, vZero, vMidProduct, 8);     
>     
> 
> 
> 
>
Your version extracts 2 8 Byte parts and feeds them into separate xor instructions. My proposal performs both 8 Byte xor operations with one vxor instruction by selecting the input bits accordingly. It furthermore avoids swapping halves forth and back (I swap the halves of vReducedLow instead).
Have you tried?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1977294138


More information about the hotspot-dev mailing list