RFR: 8073108: GHASH Intrinsics [need second reviewer]
john.r.rose at oracle.com
Wed Jun 17 18:40:22 UTC 2015
On Jun 17, 2015, at 10:40 AM, Anthony Scarpino <anthony.scarpino at oracle.com> wrote:
> On 06/15/2015 05:20 PM, John Rose wrote:
>> Thanks for taking this on.
>> It looks good, except for one thing. The intrinsic does not need to be
>> an instance method, and doing so creates an undesirable coupling between
>> the JVM and JDK. Specifically, the JDK should not need to know about
>> subkeyH and state fields. The Java code should pass those as plain
>> (array long) arguments to the intrinsic method processBlocks, which
>> should be adjusted to be static. The domain check routine should be
>> adjusted to be static also.
>> On my wish list for the future (but not now) is even less coupling
>> with the JVM. The loop code for processBlocks should be written in Java,
>> with various intrinsics (xmulx*) for dealing with single operations on
>> 128-bit values (stored in long boxes and 64-bit registers).
> I forgot the exact numbers, but having the loop in assembly instead of java resulted in about 10-15% performance improvement. The tighter loop was definitely beneficial.
That's good information; please put it in the RFE.
IMO the best (overall) way to get that 10-15% back is to get the JIT to tighten the loop.
If that works, it will of course benefit all Java loops.
>> Unsafe misaligned access routines could help simplify this also, if the
>> coding were done in Java. This is not too hard to express in Java and
>> compile to excellent code. There will be a little extra awkwardness
>> working with 64x2-vectors in a way that will compile naturally to a
>> range of ALUs (both 64- and 128-bit).
> I would have to look back at that again.. At first I was going to use Unsafe, but it seemed more complicated coding-wise compared to the assembly I saw that was in AES and SHA already.
Yes; that makes perfect sense. I want to get to the place, eventually, when the next coder looks at how "stuff gets done", they will see less assembly and more Java, and take the Java route.
>> If we get it right we can reduce
>> the amount of assembly code in the JVM and get even more timely access
>> to new data-processing instructions. Would you please file a followup
>> bug (low pri. for now) to track this, at least for GHASH and other
>> crypto loops?
>> — John
> Sure, I can file them.
More information about the hotspot-dev