RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]
Volodymyr Paprotski
duke at openjdk.org
Fri Nov 4 21:01:40 UTC 2022
On Tue, 25 Oct 2022 00:31:07 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>>
>> extra whitespace character
>
> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175:
>
>> 173: // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead
>> 174: // and not affect platforms without intrinsic support
>> 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH;
>
> The ByteBuffer version can also benefit from this optimization if it has array as backing storage.
I spent some time looking at `engineUpdate(ByteBuffer buf)`. I think it makes sense to make it into a separate PR. I think I figured out the code, but its rather 'finicky'. The existing function is already rather clever; there are quite a few cases to get correct (`engineUpdate(byte[] input, int offset, int len)` unrolled the decision tree, so its easier to reason about)
For future reference, patched but untested:
void engineUpdate(ByteBuffer buf) {
int remaining = buf.remaining();
while (remaining > 0) {
int bytesToWrite = Integer.min(remaining,
BLOCK_LENGTH - blockOffset);
if (bytesToWrite >= BLOCK_LENGTH) {
// Have at least one full block in the buf, process all full blocks
int blockMultipleLength = buf.remaining() & (~(BLOCK_LENGTH-1));
processMultipleBlocks(buf, blockMultipleLength);
remaining -= blockMultipleLength;
} else {
// We have some left-over data from previous updates, so
// copy that into the holding block until we get a full block.
buf.get(block, blockOffset, bytesToWrite);
blockOffset += bytesToWrite;
if (blockOffset >= BLOCK_LENGTH) {
processBlock(block, 0, BLOCK_LENGTH);
blockOffset = 0;
}
remaining -= bytesToWrite;
}
}
}
private void processMultipleBlocks(ByteBuffer buf, int blockMultipleLength) {
if (buf.hasArray()) {
byte[] input = buf.array();
int offset = buf.arrayOffset();
Objects.checkFromIndexSize(offset, blockMultipleLength, input.length);
a.checkLimbsForIntrinsic();
r.checkLimbsForIntrinsic();
processMultipleBlocks(input, offset, blockMultipleLength);
return;
}
while (blockMultipleLength > 0) {
processBlock(buf, BLOCK_LENGTH);
blockMultipleLength -= BLOCK_LENGTH;
}
}
But it might make more sense to emulate `engineUpdate(byte[] input, int offset, int len)` and unroll the loop. (Hint: to test for Buffer without array, create read-only buffer:
public final boolean hasArray() {
return (hb != null) && !isReadOnly;
}
end hint)
-------------
PR: https://git.openjdk.org/jdk/pull/10582
More information about the security-dev
mailing list