RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]
Sandhya Viswanathan
sviswanathan at openjdk.org
Fri Nov 4 21:05:30 UTC 2022
On Fri, 4 Nov 2022 20:59:10 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:
>> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175:
>>
>>> 173: // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead
>>> 174: // and not affect platforms without intrinsic support
>>> 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH;
>>
>> The ByteBuffer version can also benefit from this optimization if it has array as backing storage.
>
> I spent some time looking at `engineUpdate(ByteBuffer buf)`. I think it makes sense to make it into a separate PR. I think I figured out the code, but its rather 'finicky'. The existing function is already rather clever; there are quite a few cases to get correct (`engineUpdate(byte[] input, int offset, int len)` unrolled the decision tree, so its easier to reason about)
>
> For future reference, patched but untested:
>
>
> void engineUpdate(ByteBuffer buf) {
> int remaining = buf.remaining();
> while (remaining > 0) {
> int bytesToWrite = Integer.min(remaining,
> BLOCK_LENGTH - blockOffset);
>
> if (bytesToWrite >= BLOCK_LENGTH) {
> // Have at least one full block in the buf, process all full blocks
> int blockMultipleLength = buf.remaining() & (~(BLOCK_LENGTH-1));
> processMultipleBlocks(buf, blockMultipleLength);
> remaining -= blockMultipleLength;
> } else {
> // We have some left-over data from previous updates, so
> // copy that into the holding block until we get a full block.
> buf.get(block, blockOffset, bytesToWrite);
> blockOffset += bytesToWrite;
>
> if (blockOffset >= BLOCK_LENGTH) {
> processBlock(block, 0, BLOCK_LENGTH);
> blockOffset = 0;
> }
> remaining -= bytesToWrite;
> }
> }
> }
>
> private void processMultipleBlocks(ByteBuffer buf, int blockMultipleLength) {
> if (buf.hasArray()) {
> byte[] input = buf.array();
> int offset = buf.arrayOffset();
>
> Objects.checkFromIndexSize(offset, blockMultipleLength, input.length);
> a.checkLimbsForIntrinsic();
> r.checkLimbsForIntrinsic();
> processMultipleBlocks(input, offset, blockMultipleLength);
> return;
> }
>
> while (blockMultipleLength > 0) {
> processBlock(buf, BLOCK_LENGTH);
> blockMultipleLength -= BLOCK_LENGTH;
> }
> }
>
>
> But it might make more sense to emulate `engineUpdate(byte[] input, int offset, int len)` and unroll the loop. (Hint: to test for Buffer without array, create read-only buffer:
>
> public final boolean hasArray() {
> return (hb != null) && !isReadOnly;
> }
>
> end hint)
Sounds good, let us do the ByteBuffer support as a follow on PR.
-------------
PR: https://git.openjdk.org/jdk/pull/10582
More information about the security-dev
mailing list