RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v6]

Mon Jan 20 10:36:43 UTC 2025

On Thu, 9 Jan 2025 09:07:21 GMT, Suchismith Roy <sroy at openjdk.org> wrote:

>> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437)
>> 
>> Currently acceleration code for GHASH is missing for PPC64. 
>> 
>> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result.
>
> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision:
> 
>   restore

On 1/20/25 08:09, Suchismith Roy wrote:

>> This reference is even better, and comes with complete source code as well as the proofs:
>>
>> https://web.archive.org/web/20110609115824/https://software.intel.com/file/24918
> 
> Thank you.Yes the reduction algorithm is derived from here.
> However there is one more paper referred for the Karatsuba Multiplication.
> https://link.springer.com/content/pdf/10.1007/978-3-319-16715-2_9.pdf
> 
> I think even that can be mentioned in the comments then ?
Sure, but it's not a great idea to reference a non-open access paper in free
software. There's a copy of the Intel paper that does explain the use of
Karatsuba on Page 12 here:

https://github.com/intel/intel-ipsec-mb/wiki/doc/optimized-gcm-implementation.pdf

There's a comment in
src/hotspot/cpu/aarch64/macroAssembler_aarch64_aes.cpp:290, like this:

  // Karatsuba multiplication performs a 128*128 -> 256-bit
  // multiplication in three 128-bit multiplications and a few
  // additions.
  //
  // (C1:C0) = A1*B1, (D1:D0) = A0*B0, (E1:E0) = (A0+A1)(B0+B1)
  // (A1:A0)(B1:B0) = C1:(C0+C1+D1+E1):(D1+C0+D0+E0):D0
  //
  // Inputs:
  //
  // A0 in a.d[0]     (subkey)
  // A1 in a.d[1]
  // (A1+A0) in a1_xor_a0.d[0]
  //
  // B0 in b.d[0]     (state)
  // B1 in b.d[1]

In your case, the register names (a and b) would be different, of course.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20235#issuecomment-2602041665