Clogged pipes: 50x throughput degradation with large Cipher writes
Ferenc Rakoczi
ferenc.r.rakoczi at oracle.com
Thu Oct 27 14:40:42 UTC 2022
The fragmentation can be done within the update(…) functions that call the intrinsified processBlocks(…) (in this case there are only 2 of those, with 3 call sites altogether), but a more general solution would be if somehow we could tell the JIT compiler (with an annotation similar to @IntrinsicCandidate): “use the intrinsic if there is one, from the first call, I am pretty sure it will be worth it even if we use this function only once”.
Ferenc
From: security-dev <security-dev-retn at openjdk.org> on behalf of Carter Kozak <ckozak at ckozak.net>
Date: Thursday, 2022. October 27. 14:26
To: security-dev at openjdk.org <security-dev at openjdk.org>
Subject: Re: Clogged pipes: 50x throughput degradation with large Cipher writes
Thanks for your interest in the topic.
> While it might not be a problem in practice (large buffers are ok, but larger than 1mb seems seldom, especially in multi threaded apps) it is still a condition which can be handled. But with AE ciphers becoming the norm, such large cipher chunks seems to be legacy as well?
The linked benchmark is a good representation of the problem as we encountered it in a production instance. A compute job (as opposed to a typical multi-threaded server) took data that was already buffered (relatively large byte-array on heap), and attempted to store it (either to disk, S3, etc), encrypting the data first. The filesystem is likely attached via a network, however the encryption in this case is a separate concern. We use a CipherOutputStream to encrypt data as we write it to storage. The client code only sees an OutputStream, so it's not clear to the caller that inputs should be segmented into smaller chunks -- it's often best to provide fewer, larger writes to a disk. The cipher interactions in this case are the result of the way other JDK components (namely CipherOutputStream) work. BufferedOutputStream is generally used to reduce interactions with Ciphers to avoid inefficient small operations, but it passes through large buffers to the delegate, allowing multi-megabyte cipher operations.
I could see an argument for CipherOutputStream becoming responsible for chunking, although that may not be ideal for native Cipher implementations which aren't constrained in the same ways, where perhaps a Cipher instance should have the ability to recommend a maximum segment size to callers (e.g. CipherOutputStream would segment based on a recommendation from cipher instances).
> Can you clarify. You said JSSE, does this actually happen in TLS usage - how big are your TLS Records? Isn’t there a 16k limit anyway?
I'm sorry, I confused initialisms, I believe JCE is more accurate. I can test TLS, but that's not the scenario where this was problematic in production.
Carter Kozak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/security-dev/attachments/20221027/f2ea3d6b/attachment.htm>
More information about the security-dev
mailing list