RFR(L): 8069539: RSA acceleration

Sat May 16 09:07:10 UTC 2015

On 15/05/15 18:39, Christian Thalinger wrote:
> 
>> On May 8, 2015, at 8:59 AM, Andrew Haley <aph at redhat.com> wrote:
>>
>> Here is a prototype of what I propose:
>>
>> http://cr.openjdk.java.net/~aph/rsa-1/ <http://cr.openjdk.java.net/~aph/rsa-1/>
>> It is a JNI version of the fast algorithm I think we should use for
>> RSA.  It doesn't use any intrinsics.  But with it, RSA is already twice
>> as fast as the code we have now for 1024-bit keys, and almost three
>> times as fast for 2048-bit keys!
>>
>> This code (montgomery_multiply) is designed to be as easy as I can
>> possibly make it to translate into a hand-coded intrinsic.
> 
> I’m curious:  did you try to implement this in Java?

The Montgomery multiplication algorithm I presented is fast because
(unlike the Java code in BigInteger) it keeps most of its intermediate
state in registers.  In contrast, the code in BigInteger spends much
time writing digits out to memory and reading them back again.

I think it is impossible to write in Java.  The inner multiply
primitive accumulates into a three-register sum.  Like this:

   t += a * b

where a and b are unsigned longs, and t is a triple-precision unsigned
long.

It's easy to express that in GNU C with inline asm, but not in Java.
It's not possible to express it in in portable C either because C has
no add with carry.  The closest you could get in Java would be an
intrinsic into which you passed and returned an object with three
longs as the accumulator.

But as I said, this JNI code is just a demonstration.  I think that
Montgomery multiplication itself really needs to be turned into a
HotSpot intrinsic.  Then, our RSA and Diffie-Hellman performance will
be competitive with the best library code.  I can do it, but so far I
have had no response from Oracle at all.  :-(

Andrew.