RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19]
Martin Doerr
mdoerr at openjdk.org
Sat Feb 8 12:03:14 UTC 2025
On Fri, 7 Feb 2025 16:54:14 GMT, Suchismith Roy <sroy at openjdk.org> wrote:
>> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 655:
>>
>>> 653: // https://web.archive.org/web/20110609115824/https://software.intel.com/file/24918
>>> 654: //
>>> 655: Label loop;
>>
>> Please try if aligning the loop entry improves performance. I'd insert `__ align(32);` here.
>
> This is not improving performance @TheRealMDoerr
It seems to be faster on my Power9 machine. But we should check again after everything else is done.
>> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 658:
>>
>>> 656: __ bind(loop);
>>> 657: __ vspltisb(vZero, 0);
>>> 658: __ li(temp1, 0);
>>
>> I don't think these instructions should be inside of the loop.
>
> vspltisb(vZero,0) is needed.
> __ vsldoi(vTmp8, vTmp5, vZero, 8); // mL : Extract the lower 64 bits of M
> __ vsldoi(vTmp9, vZero, vTmp5, 8); // mH : Extract the higher 64 bits of M
> We need to extract appropriate bits and for that vZero needs to be initialised to 0 always.
The problem is that you're overwriting it below which should not be done:
__ vxor(vZero, vTmp4, vTmp10);
__ vmr(vState, vZero);
Why not `__ vxor(vState, vTmp4, vTmp10);`?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1947690718
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1947693146
More information about the hotspot-dev
mailing list