RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

Sandhya Viswanathan sviswanathan at openjdk.org
Fri Nov 4 21:05:30 UTC 2022


On Fri, 4 Nov 2022 20:59:10 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:

>> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175:
>> 
>>> 173:             // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead
>>> 174:             // and not affect platforms without intrinsic support
>>> 175:             int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH;
>> 
>> The ByteBuffer version can also benefit from this optimization if it has array as backing storage.
>
> I spent some time looking at `engineUpdate(ByteBuffer buf)`. I think it makes sense to make it into a separate PR. I think I figured out the code, but its rather 'finicky'. The existing function is already rather clever; there are quite a few cases to get correct (`engineUpdate(byte[] input, int offset, int len)` unrolled the decision tree, so its easier to reason about)
> 
> For future reference, patched but untested:
> 
> 
>     void engineUpdate(ByteBuffer buf) {
>         int remaining = buf.remaining();
>         while (remaining > 0) {
>             int bytesToWrite = Integer.min(remaining,
>                     BLOCK_LENGTH - blockOffset);
> 
>             if (bytesToWrite >= BLOCK_LENGTH) {
>                 // Have at least one full block in the buf, process all full blocks
>                 int blockMultipleLength = buf.remaining() & (~(BLOCK_LENGTH-1));
>                 processMultipleBlocks(buf, blockMultipleLength);
>                 remaining -= blockMultipleLength;
>             } else {
>                 // We have some left-over data from previous updates, so
>                 // copy that into the holding block until we get a full block.
>                 buf.get(block, blockOffset, bytesToWrite);
>                 blockOffset += bytesToWrite;
> 
>                 if (blockOffset >= BLOCK_LENGTH) {
>                     processBlock(block, 0, BLOCK_LENGTH);
>                     blockOffset = 0;
>                 }
>                 remaining -= bytesToWrite;
>             }
>         }
>     }
> 
>     private void processMultipleBlocks(ByteBuffer buf, int blockMultipleLength) {
>         if (buf.hasArray()) {
>             byte[] input = buf.array();
>             int offset = buf.arrayOffset();
> 
>             Objects.checkFromIndexSize(offset, blockMultipleLength, input.length);
>             a.checkLimbsForIntrinsic();
>             r.checkLimbsForIntrinsic();
>             processMultipleBlocks(input, offset, blockMultipleLength);
>             return;
>         }
> 
>         while (blockMultipleLength > 0) {
>             processBlock(buf, BLOCK_LENGTH);
>             blockMultipleLength -= BLOCK_LENGTH;
>         }
>     }
> 
> 
> But it might make more sense to emulate `engineUpdate(byte[] input, int offset, int len)` and unroll the loop. (Hint: to test for Buffer without array, create read-only buffer:
> 
>     public final boolean hasArray() {
>         return (hb != null) && !isReadOnly;
>     }
> 
> end hint)

Sounds good, let us do the ByteBuffer support as a follow on PR.

-------------

PR: https://git.openjdk.org/jdk/pull/10582



More information about the security-dev mailing list