RFR 8181594: Efficient and constant-time modular arithmetic

Fri Jan 26 21:06:13 UTC 2018

JBS: https://bugs.openjdk.java.net/browse/JDK-8181594
Webrev: http://cr.openjdk.java.net/~apetcher/8181594/webrev.00/

This is a code review for the field arithmetic that will be used in 
implementations of X25519/X448 key agreement, the Poly1305 
authenticator, and EdDSA signatures. I believe that the library has all 
the features necessary for X25519/X448 and Poly1305, and I expect at 
most a couple of minor enhancements will be required to support EdDSA. 
There is no public API for this library, so we can change it in the 
future to suit the needs of new algorithms without breaking 
compatibility with external code. Still, I made an attempt to clearly 
structure and document the (internal) API, and I want to make sure it is 
understandable and easy to use.

This is not a general-purpose modular arithmetic library. It will only 
work well in circumstances where the sequence of operations is 
restricted, and where the prime that defines the field has some useful 
structure. Moreover, each new field will require some field-specific 
code that takes into account the structure of the prime and the way the 
field is used in the application. The initial implementation includes a 
field for Poly1305 and the fields for X25519/X448 which should also work 
for EdDSA.

The benefits of using this library are that it is much more efficient 
than using similar operations in BigInteger. Also, many operations are 
branch-free, making them suitable for use in a side-channel resistant 
implementation that does not branch on secrets.

To provide some context, I have attached a code snippet describing how 
this library can be used. The snippet is the constant-time Montgomery 
ladder from my X25519/X448 implementation, which I expect to be out for 
review soon. X25519/X448 only uses standard arithmetic operations, and 
the more unusual features (e.g. add modulo a power of 2) are needed by 
Poly1305.

The field arithmetic (for all fields) is implemented using a 32-bit 
representation similar to the one described in the Ed448 paper[1] (in 
the "Implementation on 32-bit platforms" section). Though my 
implementation uses signed limbs, and grade-school multiplication 
instead of Karatsuba. The argument for correctness is essentially the 
same for all three fields: the magnitude of each 64-bit limb is at most 
2^(k-1) after reduction, except for the last limb which may have a 
magnitude of up to 2^k. The values of k are between 26 to 28 (depending 
on the field), and we can calculate that the maximum magnitude for any 
limb during an add-multiply-carry-reduce sequence is always less than 
2^63. Therefore, no overflow occurs and all operations are correct.

Process note: this enhancement is part of JEP 324 (Key Agreement with 
Curve25519 and Curve448). When this code review is complete, nothing 
will happen until all other work for this JEP is complete, and the JEP 
is accepted as part of some release. This means that this code will be 
pushed to the repo along with the X25519/X448 code that uses it.

[1] https://eprint.iacr.org/2015/625.pdf

-------------- next part --------------
    private IntegerModuloP_Base pointMultiply(byte[] k, IntegerModuloP u){

        IntegerModuloP x_1 = u;
        MutableIntegerModuloP x_2 = one.mutable();
        MutableIntegerModuloP z_2 = zero.mutable();
        MutableIntegerModuloP x_3 = u.mutable();
        MutableIntegerModuloP z_3 = one.mutable();
        int swap = 0;

        // Variables below are reused to avoid unnecessary allocation
        // They will be assigned in the loop, so initial value doesn't matter
        MutableIntegerModuloP m1 = zero.mutable();
        MutableIntegerModuloP DA = zero.mutable();
        MutableIntegerModuloP E = zero.mutable();
        MutableIntegerModuloP a24_times_E = zero.mutable();

        for(int t = params.getBits() - 1; t >= 0; t--){
            int k_t = bitAt(k, t);
            swap = swap ^ k_t;
            x_2.conditionalSwapWith(x_3, swap);
            z_2.conditionalSwapWith(z_3, swap);
            swap = k_t;

            // A(m1) = x_2 + z_2
            m1.setValue(x_2).setSum(z_2);
            // D = x_3 - z_3
            // DA = D * A(m1)
            DA.setValue(x_3).setDifference(z_3).setProduct(m1);
            // AA(m1) = A(m1)^2
            m1.setSquare();
            // B(x_2) = x_2 - z_2
            x_2.setDifference(z_2);
            // C = x_3 + z_3
            // CB(x_3) = C * B(x_2)
            x_3.setSum(z_3).setProduct(x_2);
            // BB(x_2) = B^2
            x_2.setSquare();
            // E = AA(m1) - BB(x_2)
            E.setValue(m1).setDifference(x_2);
            // compute a24 * E using SmallValue
            a24_times_E.setValue(E);
            a24_times_E.setProduct(a24);

            // assign results to x_3, z_3, x_2, z_2
            // x_2 = AA(m1) * BB
            x_2.setProduct(m1);
            // z_2 = E * (AA(m1) + a24 * E)
            z_2.setValue(m1).setSum(a24_times_E).setProduct(E);
            // z_3 = x_1*(DA - CB(x_3))^2
            z_3.setValue(DA).setDifference(x_3).setSquare().setProduct(x_1);
            // x_3 = (CB(x_3) + DA)^2
            x_3.setSum(DA).setSquare();
        }

        x_2.conditionalSwapWith(x_3, swap);
        z_2.conditionalSwapWith(z_3, swap);

        // return (x_2 * z_2^(p - 2))
        return x_2.setProduct(z_2.multiplicativeInverse());
    }