RFR: 8318756 Create better internal buffer for AEADs
Anthony Scarpino
ascarpino at openjdk.org
Tue Nov 28 01:34:17 UTC 2023
On Thu, 23 Nov 2023 10:30:52 GMT, Daniel Jeliński <djelinski at openjdk.org> wrote:
>> Hi,
>>
>> I need a review for a new internal buffer class called AEADBufferStream. AEADBufferStream extends ByteArrayOutputStream, but eliminates some data checking and copying that are not necessary for what GaloisCounterMode.java and ChaCha20Cipher.java need.
>>
>> The changes greatest benefit is with decryption operations. ChaCha20-Poly1305 had larger performance gains by adopting similar techniques that AES/GCM already uses.
>>
>> The new buffer shows up to 21% bytes/sec performance increase for decryption for ChaCha20-Poly1305 and 12% for AES/GCM. 16K data sizes saw a memory usage reduction of 46% with and 83% with ChaCha20-Poly1305. These results come from the JMH tests updated in this request and memory usage using the JMH gc profile gc.alloc.rate.norm entry
>>
>> thanks
>>
>> Tony
>
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 45:
>
>> 43: * Create an instance with no buffer
>> 44: */
>> 45: public AEADBufferedStream() {
>
> Never used. And once you remove it, `buf` is never null
Sure
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 55:
>
>> 53:
>> 54: public AEADBufferedStream(int len) {
>> 55: buf = new byte[len];
>
> Suggestion:
>
> super(len);
>
> otherwise buf will be initialized twice, once here, once in the base class constructor
Yeah.. better memory performance too.
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 56:
>
>> 54: public AEADBufferedStream(int len) {
>> 55: buf = new byte[len];
>> 56: count = 0;
>
> no need to zero-initialize, it's zero by default
Yes
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 67:
>
>> 65: */
>> 66: @Override
>> 67: public synchronized byte[] toByteArray() {
>
> remove this `synchronized`; the new `write` methods are not synchronized, so this does not add value.
Interesting, I thought I was stuck with `synchronized` from the parent
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 68:
>
>> 66: @Override
>> 67: public synchronized byte[] toByteArray() {
>> 68: if (buf.length > count) {
>
> Can this ever happen? And if it can, would it be better to return the whole buffer and have the caller extract the necessary range?
Yes, because it can reuse the allocated buffer during encryption scenarios when multi-part ops send non-block size data. For decryption, it should never happen.
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java line 94:
>
>> 92: } else {
>> 93: if (buf.length < (count + len)) {
>> 94: buf = Arrays.copyOf(buf, count + len);
>
> this will copy a lot if the user performs many small writes, for example when decrypting with CipherInputStream; see `AESGCMCipherOutputStream` benchmark.
As I understand the `ByteArrayOutputStream` code, the `ArraysSupport.newLength()` will double the allocation each time. So if the buffer is 1k size and it wants to add one more byte, the method will allocate 2k.
I agree that in my change, if you send one byte at a time, it will be doing a lot of allocating. But I don't believe that is the general case. If you have examples of apps doing small allocations, let me know and I can come up with a different scheme. But at this point I thought it was a bitter risk more allocations in the less-likely situation, than unused allocation in what I see as a more common case.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 699:
>
>> 697: * @throws ShortBufferException if the buffer {@code out} does not have
>> 698: * enough space to hold the resulting data.
>> 699: */ @Override
>
> Suggestion:
>
> */
> @Override
Yep
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 792:
>
>> 790:
>> 791: /*
>> 792: * Optimized version of bufferCrypt from CipherSpi.java. Direct
>
> Can you document the optimizations done in this function?
The second sentence says what the optimizations is.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 844:
>
>> 842: total = engine.doFinal(inArray, inOfs, inLen, outArray, outOfs);
>> 843: }
>> 844: } catch (BadPaddingException | KeyException e) {
>
> Preexisting, and probably out of scope for this PR, but we should expose the "Counter exhausted" exception in javax.crypto.Cipher; filed [JDK-8320743](https://bugs.openjdk.org/browse/JDK-8320743) for that
Certainly out of scope here.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 1427:
>
>> 1425: input.get(in);
>> 1426: byte[] out = new byte[in.length];
>> 1427: doUpdate(in, 0, in.length, out, out.length);
>
> Suggestion:
>
> byte[] out = in;
> doUpdate(in, 0, in.length, out, 0);
>
> I guess we need more test coverage here
I don't see a testing issue there, but that's better memory usage. I probably copied this code over from AES/GCM where it's blocksized data and `in` and `out` could have been different sizes. But CC20 can use this optimization because it's a streaming cipher.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 1505:
>
>> 1503: byte[] in = new byte[input.remaining()];
>> 1504: input.get(in);
>> 1505: byte[] out = new byte[in.length];
>
> Suggestion:
>
> byte[] out = in;
yep
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 1516:
>
>> 1514: input.get(in);
>> 1515: byte[] out = new byte[in.length + TAG_LENGTH];
>> 1516: doFinal(in, 0, in.length, out, 0);
>
> A bit more complex, but you can avoid one allocation here too; something like (untested):
> Suggestion:
>
> int inLen = input.remaining();
> byte[] out = new byte[inLen + TAG_LENGTH];
> byte[] in = out;
> input.get(in, 0, inLen);
> doFinal(in, 0, inLen, out, 0);
I will have to check on this, I've run into `BufferUnderflowException` too many times with ByteBuffer to say for sure at this point.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 1652:
>
>> 1650: int ctLen = getBufferedLength() + inLen;
>> 1651: if (ctLen == 0) {
>> 1652: throw new AEADBadTagException("Tag mismatch");
>
> Suggestion:
>
> if (ctLen < TAG_LENGTH) {
> throw new AEADBadTagException("Input too short - need tag");
Yep
> test/micro/org/openjdk/bench/javax/crypto/full/AESGCMBench.java line 43:
>
>> 41: AlgorithmParameterSpec getNewSpec() {
>> 42: iv_index = (iv_index + 1) % IV_MODULO;
>> 43: return new GCMParameterSpec(96, iv, iv_index, 12);
>
> Can you also change tag length to 128 bits? TLS uses 128, and 128-bit tag generates a slightly different flamegraph.
I'll see if I can do it cleanly. GCM spec defaults to 96bit and because CC20P1305 requires 96bit, it made the common code easier. I'm surprised you any differences which such a minor change.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946509
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946538
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968884
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946567
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946612
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946637
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946687
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946735
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946776
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406947706
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406947296
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406953636
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968978
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968180
More information about the security-dev
mailing list