<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">On Jun 17, 2015, at 10:40 AM, Anthony Scarpino <<a href="mailto:anthony.scarpino@oracle.com" class="">anthony.scarpino@oracle.com</a>> wrote:<br class=""><div><blockquote type="cite" class=""><br class="Apple-interchange-newline"><div class=""><span style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">On 06/15/2015 05:20 PM, John Rose wrote:</span><br style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">Thanks for taking this on.<br class=""><br class="">It looks good, except for one thing. The intrinsic does not need to be<br class="">an instance method, and doing so creates an undesirable coupling between<br class="">the JVM and JDK. Specifically, the JDK should not need to know about<br class="">subkeyH and state fields. The Java code should pass those as plain<br class="">(array long[2]) arguments to the intrinsic method processBlocks, which<br class="">should be adjusted to be static. The domain check routine should be<br class="">adjusted to be static also.<br class=""><br class="">On my wish list for the future (but not now) is even less coupling<br class="">with the JVM. The loop code for processBlocks should be written in Java,<br class="">with various intrinsics (xmulx*) for dealing with single operations on<br class="">128-bit values (stored in long[2] boxes and 64-bit registers).<br class=""></blockquote><br style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I forgot the exact numbers, but having the loop in assembly instead of java resulted in about 10-15% performance improvement. The tighter loop was definitely beneficial.</span><br style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote><div><br class=""></div>That's good information; please put it in the RFE.</div><div><br class=""></div><div>IMO the best (overall) way to get that 10-15% back is to get the JIT to tighten the loop.</div><div>If that works, it will of course benefit all Java loops.</div><div><br class=""><blockquote type="cite" class=""><div class="">The<br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">Unsafe misaligned access routines could help simplify this also, if the<br class="">coding were done in Java. This is not too hard to express in Java and<br class="">compile to excellent code. There will be a little extra awkwardness<br class="">working with 64x2-vectors in a way that will compile naturally to a<br class="">range of ALUs (both 64- and 128-bit).<br class=""></blockquote><br style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I would have to look back at that again.. At first I was going to use Unsafe, but it seemed more complicated coding-wise compared to the assembly I saw that was in AES and SHA already.</span><br style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote><div><br class=""></div><div>Yes; that makes perfect sense. I want to get to the place, eventually, when the next coder looks at how "stuff gets done", they will see less assembly and more Java, and take the Java route.</div><br class=""><blockquote type="cite" class=""><div class=""><br style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">If we get it right we can reduce<br class="">the amount of assembly code in the JVM and get even more timely access<br class="">to new data-processing instructions. Would you please file a followup<br class="">bug (low pri. for now) to track this, at least for GHASH and other<br class="">crypto loops?<br class=""><br class="">— John<br class=""></blockquote><br style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Sure, I can file them.</span><br style="font-family: Helvetica; font-size: 16px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote></div><br class=""><div class="">Good; thanks.</div><div class=""><br class=""></div><div class="">— John</div></body></html>