<div dir="ltr">Hi all,<div><br></div><div>Thanks for the prompt feedback on this stuff, appreciated.</div><div><br></div><div>1.  Analytic queries are often interactive or one-off. A data scientist would get an on-demand notebook with a Spark cluster (spawned as a K8s pod), and run a number of queries.</div><div>The cluster will be then closed either explicitly, or after a timeout. This is done both for a better resource utilization, and for security reasons. Re-using JVM for another user/tenant<br></div><div>might leak the sensitive data and encryption keys, kept in the JVM memory. </div><div>I'm not saying its the only way to solve this, there are architectures based on a long running service. But this short-lived approach is real and needs to be addressed.</div><div>Even if the data scientist keeps the cluster alive for a few hours - having to wait a long time for the results of the first few queries (because the decryption is not warmed up yet) is a problem,</div><div>since the things are interactive and expected to be done in real time.</div><div><br></div><div>2. Analytics and AI workloads work with ~ 64MB blocks; sometimes, they are broken in ~1MB pieces (like in Parquet). Still, taking even the minimal size of 1MB, and waiting the 10,000 rounds to </div><div>get the decryption acceleration, means we process the first ~10GB at a slow rate. Sounds harsh. Both in absolute numbers, and in comparison to ENcryption, which kicks in after warming up with say 1KB</div><div>chunks (created by breaking 1MB blocks into many update calls) - meaning ~1,000x faster than DEcryption.</div><div><br></div><div>3. Adam has mentioned an approach of "modifying the decryption operation (to decrypt immediately and buffer plaintext)" (in a negative context, though :).</div><div>To me, it looks like a sound solution. However, I don't know how much effort does it require (?) - but it makes decryption implementation similar to encryption, and solves the problem at hand.</div><div>Maybe there are other options, though.</div><div><br></div><div>4. AOT sounds interesting, I'll check it out. But its experimental for now. Moreover, both AOT and command line options require extra care in production, as correctly pointed out below.</div><div>They will be a hard sell in real production environments. The same is true (or even worse) for manual warm-up with a repeated decryption of small blocks. This is indeed a benchmarking hack,</div><div>I don't see it been used in production.</div><div><br></div><div>Having the decryption optimized in the HotSpot engine would be ideal.</div><div><br></div><div>Cheers, Gidon.</div><div>  </div><div class="gmail_quote"><div dir="ltr">On Thu, Nov 15, 2018 at 3:33 AM Anthony Scarpino <<a href="mailto:anthony.scarpino@oracle.com" target="_blank">anthony.scarpino@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I agree with Adam that this is more of a tuning issue and not a problem <br>

with the crypto.  Sending multiple updates is a hack.<br>

<br>

I've been aware of this bug for a while and I do not understand why this <br>

is a significant problem.  The stackoverflow comments say it takes 50 <br>

seconds to trigger the intrinsic.  If this is a long running server <br>

application slowness for the first 50 seconds is trivial.  For smaller <br>

operations, those are commonly small transactions, not decrypting a 3GB <br>

file.<br>

<br>

If it cannot be resolved by commandline options and this is occurring in <br>

a real world situation, please explain it fully.  If this is only for <br>

benchmarking, then that's not a real world situation.<br>

<br>

Tony<br>

<br>

On 11/14/18 8:41 AM, Adam Petcher wrote:<br>

> I'm adding back in hotspot-dev, because this is a somewhat tricky topic <br>

> related to intrinsics and JIT. Hopefully, a Hotspot expert can correct <br>

> anything that I say below that is wrong, and suggest any solutions that <br>

> I missed.<br>

> <br>

> The AES acceleration is implemented in a HotSpot intrinsic. In order for <br>

> it to kick in, the code must be JIT compiled by the VM. As I understand <br>

> it, this only happens to some particular method after it has been called <br>

> a certain number of times. The rules that determine this number are <br>

> somewhat complicated, but I think you can guarantee JIT in the default <br>

> configuration by calling a method 10,000 times.<br>

> <br>

> The doFinal method calls the update method, so either one should trigger <br>

> the acceleration as long as you call it enough. Breaking the message up <br>

> into smaller chunks and calling update on each one works only because it <br>

> ends up calling the update method more. You should be able to trigger <br>

> the acceleration by calling doFinal more, too.<br>

> <br>

> The reason why the workaround doesn't work with decryption is that the <br>

> decryption routine buffers the ciphertext and then decrypts it all at <br>

> the end. So calling update multiple times and then calling doFinal at <br>

> the end is essentially the same as calling doFinal once with the entire <br>

> ciphertext.<br>

> <br>

> So here are some solutions that you may want to try:<br>

> <br>

> 1) In your benchmark, run at least 10,000 "warmup" iterations of <br>

> whatever you are trying to do at the beginning, without timing it. This <br>

> is a good idea for benchmarks, anyway. If it helps, you can try using <br>

> smaller buffers in your "warmup" phase in order to get it to complete <br>

> faster.<br>

> <br>

> 2) Try -XX:CompileThreshold=(some number smaller than 10000) as an <br>

> argument to java. This will make JIT kick in sooner across the board. <br>

> Obviously, this should be done carefully in production, since it will <br>

> impact the performance of the entire program.<br>

> <br>

> 3) I haven't tried this, but running with an AOTed java.base module may <br>

> also help. See the section titled "Steps to generate and use an AOT <br>

> library for the java.base module" in the AOT JEP[1].<br>

> <br>

> "Fixing" this issue in the JDK is non-trivial, because it gets into the <br>

> behavior of the VM and JIT. I don't really like the idea of modifying <br>

> doFinal (to break up the operation into multiple update calls) or <br>

> modifying the decryption operation (to decrypt immediately and buffer <br>

> plaintext) in order to work around this issue. Perhaps there is a better <br>

> way for the VM to handle cases like this, in which a method is not <br>

> called often, but the interpreted execution takes a long time to <br>

> complete when it is called. Perhaps a VM expert will have some <br>

> additional thoughts here.<br>

> <br>

> [1] <a href="https://openjdk.java.net/jeps/295" rel="noreferrer" target="_blank">https://openjdk.java.net/jeps/295</a><br>

> <br>

> On 11/14/2018 9:49 AM, Severin Gehwolf wrote:<br>

>> Dropping hotspot-dev and adding security-dev.<br>

>><br>

>> On Wed, 2018-11-14 at 14:39 +0200, Gidon Gershinsky wrote:<br>

>>> Hi,<br>

>>><br>

>>> We are working on an encryption mechanism at the Apache Parquet -<br>

>>> that will enable efficient analytics on encrypted data by frameworks<br>

>>> such as Apache Spark.<br>

>>> <a href="https://github.com/apache/parquet-format/blob/encryption/Encryption.md" rel="noreferrer" target="_blank">https://github.com/apache/parquet-format/blob/encryption/Encryption.md</a><br>

>>> <a href="https://www.slideshare.net/databricks/efficient-spark-analytics-on-encrypted-data-with-gidon-gershinsky" rel="noreferrer" target="_blank">https://www.slideshare.net/databricks/efficient-spark-analytics-on-encrypted-data-with-gidon-gershinsky</a> <br>

>>><br>

>>><br>

>>> We came across an AES-related issue in the Java HostSpot engine that<br>

>>> looks like a substantial problem for us in both Spark and Parquet<br>

>>> workloads. The bug report had been accepted a while ago:<br>

>>> <a href="https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8201633" rel="noreferrer" target="_blank">https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8201633</a><br>

>>><br>

>>> The fix should hopefully be rather straightforward though.<br>

>>> Could you help us with that? I have a couple of small samples<br>

>>> reproducing the problem.<br>

>>><br>

>>> (If I'm writing to a wrong mailing list - I apologize, please point<br>

>>> me in the right direction).<br>

>>><br>

>>> Cheers, Gidon.<br>

<br>

</blockquote></div></div>