RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28]
Suchismith Roy
sroy at openjdk.org
Fri Mar 7 17:02:01 UTC 2025
On Mon, 3 Mar 2025 10:47:59 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:
>> @TheRealMDoerr can you explain how it can be equivalent to these 4 instructions ?
>> we are extracting the different parts of midProduct here ,64 bits each, for the cross product.
>> I,e Xl * Hh +Hl*Xh , so the below 2 are required
>> masm->vsldoi(vTmp8, vMidProduct, vZero, 8);
>> masm->vsldoi(vTmp9, vZero, vMidProduct, 8);
>>
>>
>>
>>
>>
>
> Your version extracts 2 8 Byte parts and feeds them into separate xor instructions. My proposal performs both 8 Byte xor operations with one vxor instruction by selecting the input bits accordingly. It furthermore avoids swapping halves forth and back (I swap the halves of vReducedLow instead).
> Have you tried?
@TheRealMDoerr Yes. The tests do not pass with this.
Trying to find a scope to reduce instructions.
masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap
masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant
masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap
can be brought down to 2 instructions.
Still looking for scope to reduce. Let me know your inputs
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1985402217
More information about the hotspot-dev
mailing list