<div dir="ltr">Hi all,<div><br></div><div>Thanks for the prompt feedback on this stuff, appreciated.</div><div><br></div><div>1. Analytic queries are often interactive or one-off. A data scientist would get an on-demand notebook with a Spark cluster (spawned as a K8s pod), and run a number of queries.</div><div>The cluster will be then closed either explicitly, or after a timeout. This is done both for a better resource utilization, and for security reasons. Re-using JVM for another user/tenant<br></div><div>might leak the sensitive data and encryption keys, kept in the JVM memory. </div><div>I'm not saying its the only way to solve this, there are architectures based on a long running service. But this short-lived approach is real and needs to be addressed.</div><div>Even if the data scientist keeps the cluster alive for a few hours - having to wait a long time for the results of the first few queries (because the decryption is not warmed up yet) is a problem,</div><div>since the things are interactive and expected to be done in real time.</div><div><br></div><div>2. Analytics and AI workloads work with ~ 64MB blocks; sometimes, they are broken in ~1MB pieces (like in Parquet). Still, taking even the minimal size of 1MB, and waiting the 10,000 rounds to </div><div>get the decryption acceleration, means we process the first ~10GB at a slow rate. Sounds harsh. Both in absolute numbers, and in comparison to ENcryption, which kicks in after warming up with say 1KB</div><div>chunks (created by breaking 1MB blocks into many update calls) - meaning ~1,000x faster than DEcryption.</div><div><br></div><div>3. Adam has mentioned an approach of "modifying the decryption operation (to decrypt immediately and buffer plaintext)" (in a negative context, though :).</div><div>To me, it looks like a sound solution. However, I don't know how much effort does it require (?) - but it makes decryption implementation similar to encryption, and solves the problem at hand.</div><div>Maybe there are other options, though.</div><div><br></div><div>4. AOT sounds interesting, I'll check it out. But its experimental for now. Moreover, both AOT and command line options require extra care in production, as correctly pointed out below.</div><div>They will be a hard sell in real production environments. The same is true (or even worse) for manual warm-up with a repeated decryption of small blocks. This is indeed a benchmarking hack,</div><div>I don't see it been used in production.</div><div><br></div><div>Having the decryption optimized in the HotSpot engine would be ideal.</div><div><br></div><div>Cheers, Gidon.</div><div> </div><div class="gmail_quote"><div dir="ltr">On Thu, Nov 15, 2018 at 3:33 AM Anthony Scarpino <<a href="mailto:anthony.scarpino@oracle.com" target="_blank">anthony.scarpino@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I agree with Adam that this is more of a tuning issue and not a problem <br>
with the crypto. Sending multiple updates is a hack.<br>
<br>
I've been aware of this bug for a while and I do not understand why this <br>
is a significant problem. The stackoverflow comments say it takes 50 <br>
seconds to trigger the intrinsic. If this is a long running server <br>
application slowness for the first 50 seconds is trivial. For smaller <br>
operations, those are commonly small transactions, not decrypting a 3GB <br>
file.<br>
<br>
If it cannot be resolved by commandline options and this is occurring in <br>
a real world situation, please explain it fully. If this is only for <br>
benchmarking, then that's not a real world situation.<br>
<br>
Tony<br>
<br>
On 11/14/18 8:41 AM, Adam Petcher wrote:<br>
> I'm adding back in hotspot-dev, because this is a somewhat tricky topic <br>
> related to intrinsics and JIT. Hopefully, a Hotspot expert can correct <br>
> anything that I say below that is wrong, and suggest any solutions that <br>
> I missed.<br>
> <br>
> The AES acceleration is implemented in a HotSpot intrinsic. In order for <br>
> it to kick in, the code must be JIT compiled by the VM. As I understand <br>
> it, this only happens to some particular method after it has been called <br>
> a certain number of times. The rules that determine this number are <br>
> somewhat complicated, but I think you can guarantee JIT in the default <br>
> configuration by calling a method 10,000 times.<br>
> <br>
> The doFinal method calls the update method, so either one should trigger <br>
> the acceleration as long as you call it enough. Breaking the message up <br>
> into smaller chunks and calling update on each one works only because it <br>
> ends up calling the update method more. You should be able to trigger <br>
> the acceleration by calling doFinal more, too.<br>
> <br>
> The reason why the workaround doesn't work with decryption is that the <br>
> decryption routine buffers the ciphertext and then decrypts it all at <br>
> the end. So calling update multiple times and then calling doFinal at <br>
> the end is essentially the same as calling doFinal once with the entire <br>
> ciphertext.<br>
> <br>
> So here are some solutions that you may want to try:<br>
> <br>
> 1) In your benchmark, run at least 10,000 "warmup" iterations of <br>
> whatever you are trying to do at the beginning, without timing it. This <br>
> is a good idea for benchmarks, anyway. If it helps, you can try using <br>
> smaller buffers in your "warmup" phase in order to get it to complete <br>
> faster.<br>
> <br>
> 2) Try -XX:CompileThreshold=(some number smaller than 10000) as an <br>
> argument to java. This will make JIT kick in sooner across the board. <br>
> Obviously, this should be done carefully in production, since it will <br>
> impact the performance of the entire program.<br>
> <br>
> 3) I haven't tried this, but running with an AOTed java.base module may <br>
> also help. See the section titled "Steps to generate and use an AOT <br>
> library for the java.base module" in the AOT JEP[1].<br>
> <br>
> "Fixing" this issue in the JDK is non-trivial, because it gets into the <br>
> behavior of the VM and JIT. I don't really like the idea of modifying <br>
> doFinal (to break up the operation into multiple update calls) or <br>
> modifying the decryption operation (to decrypt immediately and buffer <br>
> plaintext) in order to work around this issue. Perhaps there is a better <br>
> way for the VM to handle cases like this, in which a method is not <br>
> called often, but the interpreted execution takes a long time to <br>
> complete when it is called. Perhaps a VM expert will have some <br>
> additional thoughts here.<br>
> <br>
> [1] <a href="https://openjdk.java.net/jeps/295" rel="noreferrer" target="_blank">https://openjdk.java.net/jeps/295</a><br>
> <br>
> On 11/14/2018 9:49 AM, Severin Gehwolf wrote:<br>
>> Dropping hotspot-dev and adding security-dev.<br>
>><br>
>> On Wed, 2018-11-14 at 14:39 +0200, Gidon Gershinsky wrote:<br>
>>> Hi,<br>
>>><br>
>>> We are working on an encryption mechanism at the Apache Parquet -<br>
>>> that will enable efficient analytics on encrypted data by frameworks<br>
>>> such as Apache Spark.<br>
>>> <a href="https://github.com/apache/parquet-format/blob/encryption/Encryption.md" rel="noreferrer" target="_blank">https://github.com/apache/parquet-format/blob/encryption/Encryption.md</a><br>
>>> <a href="https://www.slideshare.net/databricks/efficient-spark-analytics-on-encrypted-data-with-gidon-gershinsky" rel="noreferrer" target="_blank">https://www.slideshare.net/databricks/efficient-spark-analytics-on-encrypted-data-with-gidon-gershinsky</a> <br>
>>><br>
>>><br>
>>> We came across an AES-related issue in the Java HostSpot engine that<br>
>>> looks like a substantial problem for us in both Spark and Parquet<br>
>>> workloads. The bug report had been accepted a while ago:<br>
>>> <a href="https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8201633" rel="noreferrer" target="_blank">https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8201633</a><br>
>>><br>
>>> The fix should hopefully be rather straightforward though.<br>
>>> Could you help us with that? I have a couple of small samples<br>
>>> reproducing the problem.<br>
>>><br>
>>> (If I'm writing to a wrong mailing list - I apologize, please point<br>
>>> me in the right direction).<br>
>>><br>
>>> Cheers, Gidon.<br>
<br>
</blockquote></div></div>