RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

Volodymyr Paprotski duke at openjdk.org
Fri Nov 4 14:40:45 UTC 2022


On Fri, 28 Oct 2022 20:58:33 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:

>> No, going the WhiteBox  route was not something I was thinking of.  I sought feedback from a couple hotspot-knowledgable people about the use of WhiteBox APIs and both felt that it was not the right way to go.  One said that WhiteBox is really for VM testing and not for these kinds of java classes.
>
> One idea I was trying to measure was to make the intrinsic (i.e. the while loop remains exactly the same, just moved to different =non-static= function):
> 
> private void processMultipleBlocks(byte[] input, int offset, int length) { //, MutableIntegerModuloP A, IntegerModuloP R) {
>     while (length >= BLOCK_LENGTH) {
>         n.setValue(input, offset, BLOCK_LENGTH, (byte)0x01);
>         a.setSum(n);                    // A += (temp | 0x01)
>         a.setProduct(r);                // A =  (A * R) % p
>         offset += BLOCK_LENGTH;
>         length -= BLOCK_LENGTH;
>     }
> }
> 
> 
> In principle, the java version would not get any slower (i.e. there is only one extra function jump). At the expense of the C++ glue getting more complex. In C++ I need to dig out using IR `(sun.security.util.math.intpoly.IntegerPolynomial.MutableElement)(this.a).limbs` then convert 5*26bit limbs into 3*44-bit limbs. The IR is very new to me so will take some time. (I think I found some AES code that does something similar).
> 
> That said.. I thought this idea would had been perhaps a separate PR, if needed at all.. Digging limbs out is one thing, but also need to add asserts and safety. Mostly would be happy to just measure if its worth it.

thread resumed below

-------------

PR: https://git.openjdk.org/jdk/pull/10582


More information about the security-dev mailing list