Problems with AES-GCM native acceleration

Gidon Gershinsky gg5070 at
Thu Nov 15 10:42:37 UTC 2018

<re-sending after list authorization>

Hi all,

Thanks for the prompt feedback on this stuff, appreciated.

1.  Analytic queries are often interactive or one-off. A data scientist
would get an on-demand notebook with a Spark cluster (spawned as a K8s
pod), and run a number of queries.
The cluster will be then closed either explicitly, or after a timeout. This
is done both for a better resource utilization, and for security reasons.
Re-using JVM for another user/tenant
might leak the sensitive data and encryption keys, kept in the JVM memory.
I'm not saying its the only way to solve this, there are architectures
based on a long running service. But this short-lived approach is real and
needs to be addressed.
Even if the data scientist keeps the cluster alive for a few hours - having
to wait a long time for the results of the first few queries (because the
decryption is not warmed up yet) is a problem,
since the things are interactive and expected to be done in real time.

2. Analytics and AI workloads work with ~ 64MB blocks; sometimes, they are
broken in ~1MB pieces (like in Parquet). Still, taking even the minimal
size of 1MB, and waiting the 10,000 rounds to
get the decryption acceleration, means we process the first ~10GB at a slow
rate. Sounds harsh. Both in absolute numbers, and in comparison to
ENcryption, which kicks in after warming up with say 1KB
chunks (created by breaking 1MB blocks into many update calls) - meaning
~1,000x faster than DEcryption.

3. Adam has mentioned an approach of "modifying the decryption operation
(to decrypt immediately and buffer plaintext)" (in a negative context,
though :).
To me, it looks like a sound solution. However, I don't know how much
effort does it require (?) - but it makes decryption implementation similar
to encryption, and solves the problem at hand.
Maybe there are other options, though.

4. AOT sounds interesting, I'll check it out. But its experimental for now.
Moreover, both AOT and command line options require extra care in
production, as correctly pointed out below.
They will be a hard sell in real production environments. The same is true
(or even worse) for manual warm-up with a repeated decryption of small
blocks. This is indeed a benchmarking hack,
I don't see it been used in production.

Having the decryption optimized in the HotSpot engine would be ideal.

Cheers, Gidon.

On Thu, Nov 15, 2018 at 3:33 AM Anthony Scarpino <
anthony.scarpino at> wrote:

> I agree with Adam that this is more of a tuning issue and not a problem
> with the crypto.  Sending multiple updates is a hack.
> I've been aware of this bug for a while and I do not understand why this
> is a significant problem.  The stackoverflow comments say it takes 50
> seconds to trigger the intrinsic.  If this is a long running server
> application slowness for the first 50 seconds is trivial.  For smaller
> operations, those are commonly small transactions, not decrypting a 3GB
> file.
> If it cannot be resolved by commandline options and this is occurring in
> a real world situation, please explain it fully.  If this is only for
> benchmarking, then that's not a real world situation.
> Tony
> On 11/14/18 8:41 AM, Adam Petcher wrote:
> > I'm adding back in hotspot-dev, because this is a somewhat tricky topic
> > related to intrinsics and JIT. Hopefully, a Hotspot expert can correct
> > anything that I say below that is wrong, and suggest any solutions that
> > I missed.
> >
> > The AES acceleration is implemented in a HotSpot intrinsic. In order for
> > it to kick in, the code must be JIT compiled by the VM. As I understand
> > it, this only happens to some particular method after it has been called
> > a certain number of times. The rules that determine this number are
> > somewhat complicated, but I think you can guarantee JIT in the default
> > configuration by calling a method 10,000 times.
> >
> > The doFinal method calls the update method, so either one should trigger
> > the acceleration as long as you call it enough. Breaking the message up
> > into smaller chunks and calling update on each one works only because it
> > ends up calling the update method more. You should be able to trigger
> > the acceleration by calling doFinal more, too.
> >
> > The reason why the workaround doesn't work with decryption is that the
> > decryption routine buffers the ciphertext and then decrypts it all at
> > the end. So calling update multiple times and then calling doFinal at
> > the end is essentially the same as calling doFinal once with the entire
> > ciphertext.
> >
> > So here are some solutions that you may want to try:
> >
> > 1) In your benchmark, run at least 10,000 "warmup" iterations of
> > whatever you are trying to do at the beginning, without timing it. This
> > is a good idea for benchmarks, anyway. If it helps, you can try using
> > smaller buffers in your "warmup" phase in order to get it to complete
> > faster.
> >
> > 2) Try -XX:CompileThreshold=(some number smaller than 10000) as an
> > argument to java. This will make JIT kick in sooner across the board.
> > Obviously, this should be done carefully in production, since it will
> > impact the performance of the entire program.
> >
> > 3) I haven't tried this, but running with an AOTed java.base module may
> > also help. See the section titled "Steps to generate and use an AOT
> > library for the java.base module" in the AOT JEP[1].
> >
> > "Fixing" this issue in the JDK is non-trivial, because it gets into the
> > behavior of the VM and JIT. I don't really like the idea of modifying
> > doFinal (to break up the operation into multiple update calls) or
> > modifying the decryption operation (to decrypt immediately and buffer
> > plaintext) in order to work around this issue. Perhaps there is a better
> > way for the VM to handle cases like this, in which a method is not
> > called often, but the interpreted execution takes a long time to
> > complete when it is called. Perhaps a VM expert will have some
> > additional thoughts here.
> >
> > [1]
> >
> > On 11/14/2018 9:49 AM, Severin Gehwolf wrote:
> >> Dropping hotspot-dev and adding security-dev.
> >>
> >> On Wed, 2018-11-14 at 14:39 +0200, Gidon Gershinsky wrote:
> >>> Hi,
> >>>
> >>> We are working on an encryption mechanism at the Apache Parquet -
> >>> that will enable efficient analytics on encrypted data by frameworks
> >>> such as Apache Spark.
> >>>
> >>>
> >>>
> >>>
> >>> We came across an AES-related issue in the Java HostSpot engine that
> >>> looks like a substantial problem for us in both Spark and Parquet
> >>> workloads. The bug report had been accepted a while ago:
> >>>
> >>>
> >>> The fix should hopefully be rather straightforward though.
> >>> Could you help us with that? I have a couple of small samples
> >>> reproducing the problem.
> >>>
> >>> (If I'm writing to a wrong mailing list - I apologize, please point
> >>> me in the right direction).
> >>>
> >>> Cheers, Gidon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the security-dev mailing list