RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

Volodymyr Paprotski duke at openjdk.org
Fri Nov 4 21:01:40 UTC 2022


On Tue, 25 Oct 2022 00:31:07 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   extra whitespace character
>
> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175:
> 
>> 173:             // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead
>> 174:             // and not affect platforms without intrinsic support
>> 175:             int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH;
> 
> The ByteBuffer version can also benefit from this optimization if it has array as backing storage.

I spent some time looking at `engineUpdate(ByteBuffer buf)`. I think it makes sense to make it into a separate PR. I think I figured out the code, but its rather 'finicky'. The existing function is already rather clever; there are quite a few cases to get correct (`engineUpdate(byte[] input, int offset, int len)` unrolled the decision tree, so its easier to reason about)

For future reference, patched but untested:


    void engineUpdate(ByteBuffer buf) {
        int remaining = buf.remaining();
        while (remaining > 0) {
            int bytesToWrite = Integer.min(remaining,
                    BLOCK_LENGTH - blockOffset);

            if (bytesToWrite >= BLOCK_LENGTH) {
                // Have at least one full block in the buf, process all full blocks
                int blockMultipleLength = buf.remaining() & (~(BLOCK_LENGTH-1));
                processMultipleBlocks(buf, blockMultipleLength);
                remaining -= blockMultipleLength;
            } else {
                // We have some left-over data from previous updates, so
                // copy that into the holding block until we get a full block.
                buf.get(block, blockOffset, bytesToWrite);
                blockOffset += bytesToWrite;

                if (blockOffset >= BLOCK_LENGTH) {
                    processBlock(block, 0, BLOCK_LENGTH);
                    blockOffset = 0;
                }
                remaining -= bytesToWrite;
            }
        }
    }

    private void processMultipleBlocks(ByteBuffer buf, int blockMultipleLength) {
        if (buf.hasArray()) {
            byte[] input = buf.array();
            int offset = buf.arrayOffset();

            Objects.checkFromIndexSize(offset, blockMultipleLength, input.length);
            a.checkLimbsForIntrinsic();
            r.checkLimbsForIntrinsic();
            processMultipleBlocks(input, offset, blockMultipleLength);
            return;
        }

        while (blockMultipleLength > 0) {
            processBlock(buf, BLOCK_LENGTH);
            blockMultipleLength -= BLOCK_LENGTH;
        }
    }


But it might make more sense to emulate `engineUpdate(byte[] input, int offset, int len)` and unroll the loop. (Hint: to test for Buffer without array, create read-only buffer:

    public final boolean hasArray() {
        return (hb != null) && !isReadOnly;
    }

end hint)

-------------

PR: https://git.openjdk.org/jdk/pull/10582


More information about the security-dev mailing list