RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v6]
Andrew Haley
aph at openjdk.org
Mon Jan 20 10:36:43 UTC 2025
On Thu, 9 Jan 2025 09:07:21 GMT, Suchismith Roy <sroy at openjdk.org> wrote:
>> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437)
>>
>> Currently acceleration code for GHASH is missing for PPC64.
>>
>> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result.
>
> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision:
>
> restore
On 1/20/25 08:09, Suchismith Roy wrote:
>> This reference is even better, and comes with complete source code as well as the proofs:
>>
>> https://web.archive.org/web/20110609115824/https://software.intel.com/file/24918
>
> Thank you.Yes the reduction algorithm is derived from here.
> However there is one more paper referred for the Karatsuba Multiplication.
> https://link.springer.com/content/pdf/10.1007/978-3-319-16715-2_9.pdf
>
> I think even that can be mentioned in the comments then ?
Sure, but it's not a great idea to reference a non-open access paper in free
software. There's a copy of the Intel paper that does explain the use of
Karatsuba on Page 12 here:
https://github.com/intel/intel-ipsec-mb/wiki/doc/optimized-gcm-implementation.pdf
There's a comment in
src/hotspot/cpu/aarch64/macroAssembler_aarch64_aes.cpp:290, like this:
// Karatsuba multiplication performs a 128*128 -> 256-bit
// multiplication in three 128-bit multiplications and a few
// additions.
//
// (C1:C0) = A1*B1, (D1:D0) = A0*B0, (E1:E0) = (A0+A1)(B0+B1)
// (A1:A0)(B1:B0) = C1:(C0+C1+D1+E1):(D1+C0+D0+E0):D0
//
// Inputs:
//
// A0 in a.d[0] (subkey)
// A1 in a.d[1]
// (A1+A0) in a1_xor_a0.d[0]
//
// B0 in b.d[0] (state)
// B1 in b.d[1]
In your case, the register names (a and b) would be different, of course.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20235#issuecomment-2602041665
More information about the hotspot-dev
mailing list