RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28]
Martin Doerr
mdoerr at openjdk.org
Fri Feb 28 16:36:04 UTC 2025
On Thu, 27 Feb 2025 13:40:51 GMT, Suchismith Roy <sroy at openjdk.org> wrote:
>> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437)
>>
>> Currently acceleration code for GHASH is missing for PPC64.
>>
>> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result.
>
> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision:
>
> use vsplitsb
src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 561:
> 559: VectorRegister vLowProduct, VectorRegister vMidProduct, VectorRegister vHighProduct,
> 560: VectorRegister vReducedLow, VectorRegister vTmp8, VectorRegister vTmp9,
> 561: VectorRegister vCombinedResult, VectorRegister vSwappedH) {
I'd adjust the indentation.
src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 574:
> 572: masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap
> 573: masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant
> 574: masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap
The part between the vpsumd instructions looks too complicated. Isn't it equivalent to the following?
masm->vsldoi(vTmp8, vLowProduct, vHighProduct, 8);
masm->vsldoi(vTmp9, vReducedLow, vReducedLow, 8);
masm->vxor(vTmp8, vTmp8, vMidProduct);
masm->vxor(vCombinedResult, vTmp8, vTmp9);
src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 699:
> 697:
> 698: __ bind(L_initialize_unaligned_loop);
> 699: __ li(temp1,0);
Missing whitespace.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1975696315
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1975694548
PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1975696864
More information about the hotspot-dev
mailing list