RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19]

Martin Doerr mdoerr at openjdk.org
Sat Feb 8 12:03:14 UTC 2025


On Fri, 7 Feb 2025 16:54:14 GMT, Suchismith Roy <sroy at openjdk.org> wrote:

>> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 655:
>> 
>>> 653:   // https://web.archive.org/web/20110609115824/https://software.intel.com/file/24918
>>> 654:   //
>>> 655:   Label loop;
>> 
>> Please try if aligning the loop entry improves performance. I'd insert `__ align(32);` here.
>
> This is not improving performance @TheRealMDoerr

It seems to be faster on my Power9 machine. But we should check again after everything else is done.

>> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 658:
>> 
>>> 656:   __ bind(loop);
>>> 657:     __ vspltisb(vZero, 0);
>>> 658:     __ li(temp1, 0);
>> 
>> I don't think these instructions should be inside of the loop.
>
> vspltisb(vZero,0) is needed. 
>     __ vsldoi(vTmp8, vTmp5, vZero, 8);          // mL : Extract the lower 64 bits of M
>     __ vsldoi(vTmp9, vZero, vTmp5, 8);          // mH : Extract the higher 64 bits of M
>     We need to extract appropriate bits and for that vZero needs to be initialised to 0 always.

The problem is that you're overwriting it below which should not be done:

    __ vxor(vZero, vTmp4, vTmp10);
    __ vmr(vState, vZero);

Why not `__ vxor(vState, vTmp4, vTmp10);`?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1947690718
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1947693146


More information about the hotspot-dev mailing list