RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28]
Martin Doerr
mdoerr at openjdk.org
Mon Mar 3 10:50:56 UTC 2025
On Sun, 2 Mar 2025 17:07:11 GMT, Suchismith Roy <sroy at openjdk.org> wrote:
>> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 574:
>>
>>> 572: masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap
>>> 573: masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant
>>> 574: masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap
>>
>> The part between the vpsumd instructions looks too complicated. Isn't it equivalent to the following?
>>
>> masm->vsldoi(vTmp8, vLowProduct, vHighProduct, 8);
>> masm->vsldoi(vTmp9, vReducedLow, vReducedLow, 8);
>> masm->vxor(vTmp8, vTmp8, vMidProduct);
>> masm->vxor(vCombinedResult, vTmp8, vTmp9);
>
> @TheRealMDoerr can you explain how it can be equivalent to these 4 instructions ?
> we are extracting the different parts of midProduct here ,64 bits each, for the cross product.
> I,e Xl * Hh +Hl*Xh , so the below 2 are required
> masm->vsldoi(vTmp8, vMidProduct, vZero, 8);
> masm->vsldoi(vTmp9, vZero, vMidProduct, 8);
>
>
>
>
>
Your version extracts 2 8 Byte parts and feeds them into separate xor instructions. My proposal performs both 8 Byte xor operations with one vxor instruction by selecting the input bits accordingly. It furthermore avoids swapping halves forth and back (I swap the halves of vReducedLow instead).
Have you tried?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1977294138
More information about the hotspot-dev
mailing list