RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v6]

Wed Jan 15 18:31:38 UTC 2025

On Wed, 15 Jan 2025 16:45:22 GMT, Suchismith Roy <sroy at openjdk.org> wrote:

> > The commenting here is poor.
> > GHASH uses little-endian for the byte order, but big-endian for the bit order. For example, the polynomial 1 is represented as the 16-byte string 80 00 00 00 | 12 bytes of 00. So, we must either reverse the bytes in each word and do everything big-endian or reverse the bits in each byte and do it little-endian. Which do you do?
> > Sure, I could figure it out by reading the code, but please say.
> 
> Hi Andrew
> 
> I would like to understand if I have fully understood your comment.
> 
> Currently the load instruction takes care of the endianness ,for subkey and state. For loading the data, we enforce the endianness and reorder the bytes order using vec_perm. vec_perm(vH, vHigh, vLow, loadOrder);

> I am assuming the inputs for GHASH follows the endianness as per the algorithm, as you have mentioned. I have made sure they are in the appropriate intended representation for both LE and BE platforms(using vec_perm and appropriate load instructions)
> 
> In the algorithm that I have used , 0xC2 is the polynomial for reduction.
>
> It is shifted by 56 bits to make It the most significant byte. I think this is little endian byte order ? I just had to do the operations with the reduction polynomial to align it as per the algorithm.

Right, so in this implementation the low-order bits of the field polynomial (i.e. p = z^7+z^2+z+1) are represented as 0xC2, or 11000010.  But you will note that there is a bit missing here. the low-order bits of the field polynomial should have four bits set. And in GHASH.java in the JDK, 0xe100000000000000 is used, which is a bit more obvious.

I think you're using the trick described in Intel's _Optimized Galois-Counter-Mode Implementation on Intel® Architecture Processors_, which represents the polynomial in a shifted form as, in effect, `1:C200000000000000`. 
Unfortunately, the constant `vConstC2` does not appear anywhere in this PR, so I had no way to know that. I guess that this PR does not even compile.

The main problem is, though, that there is little commentary in the code which explains how things are encoded. If you're using a bit-reversed and shifted representation of a polynomial, you have to say that. If youre using the algorithm described in the Intel paper, you have to say that too. Have pity on the reader.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20235#issuecomment-2593664404