RFR(L): 8069539: RSA acceleration

Viswanathan, Sandhya sandhya.viswanathan at intel.com
Tue May 12 23:25:00 UTC 2015


I was wondering if inline assembly can be used in the native code of JRE.  Does the JRE build framework support it?

Thanks,
Sandhya

-----Original Message-----
From: Andrew Haley [mailto:aph at redhat.com] 
Sent: Friday, May 08, 2015 8:59 AM
To: Vladimir Kozlov; Florian Weimer; Viswanathan, Sandhya; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(L): 8069539: RSA acceleration

Here is a prototype of what I propose:

http://cr.openjdk.java.net/~aph/rsa-1/

It is a JNI version of the fast algorithm I think we should use for RSA.  It doesn't use any intrinsics.  But with it, RSA is already twice as fast as the code we have now for 1024-bit keys, and almost three times as fast for 2048-bit keys!

This code (montgomery_multiply) is designed to be as easy as I can possibly make it to translate into a hand-coded intrinsic.  It will then offer better performance still, with the JNI overhead gone and with some carefully hand-tweaked memory accesses.  It has a very regular structure which should make it fairly easy to turn into a software pipeline with overlapped fetching and multiplication, although this will perhaps be difficult on register-starved machines like the x86.

The JNI version can still be used for those machines where people don't yet want to write a hand-coded intrinsic.

I haven't yet done anything about Montgomery squaring: it's asymptotically 25% faster than Montgomery multiplication.

I have only written it for x86_64.  Systems without a 64-bit multiply won't benefit very much from this idea, but they are pretty much legacy anyway: I want to concentrate on 64-bit systems.

So, shall we go with this?  I don't think you will find any faster way to do it.

Andrew.


More information about the hotspot-compiler-dev mailing list