RFR(L): 8069539: RSA acceleration
Andrew Haley
aph at redhat.com
Fri May 8 15:59:04 UTC 2015
Here is a prototype of what I propose:
http://cr.openjdk.java.net/~aph/rsa-1/
It is a JNI version of the fast algorithm I think we should use for
RSA. It doesn't use any intrinsics. But with it, RSA is already twice
as fast as the code we have now for 1024-bit keys, and almost three
times as fast for 2048-bit keys!
This code (montgomery_multiply) is designed to be as easy as I can
possibly make it to translate into a hand-coded intrinsic. It will
then offer better performance still, with the JNI overhead gone and
with some carefully hand-tweaked memory accesses. It has a very
regular structure which should make it fairly easy to turn into a
software pipeline with overlapped fetching and multiplication,
although this will perhaps be difficult on register-starved machines
like the x86.
The JNI version can still be used for those machines where people
don't yet want to write a hand-coded intrinsic.
I haven't yet done anything about Montgomery squaring: it's
asymptotically 25% faster than Montgomery multiplication.
I have only written it for x86_64. Systems without a 64-bit multiply
won't benefit very much from this idea, but they are pretty much
legacy anyway: I want to concentrate on 64-bit systems.
So, shall we go with this? I don't think you will find any faster way
to do it.
Andrew.
More information about the hotspot-compiler-dev
mailing list