RFR 8181594: Efficient and constant-time modular arithmetic
Adam Petcher
adam.petcher at oracle.com
Fri Jan 26 21:06:13 UTC 2018
JBS: https://bugs.openjdk.java.net/browse/JDK-8181594
Webrev: http://cr.openjdk.java.net/~apetcher/8181594/webrev.00/
This is a code review for the field arithmetic that will be used in
implementations of X25519/X448 key agreement, the Poly1305
authenticator, and EdDSA signatures. I believe that the library has all
the features necessary for X25519/X448 and Poly1305, and I expect at
most a couple of minor enhancements will be required to support EdDSA.
There is no public API for this library, so we can change it in the
future to suit the needs of new algorithms without breaking
compatibility with external code. Still, I made an attempt to clearly
structure and document the (internal) API, and I want to make sure it is
understandable and easy to use.
This is not a general-purpose modular arithmetic library. It will only
work well in circumstances where the sequence of operations is
restricted, and where the prime that defines the field has some useful
structure. Moreover, each new field will require some field-specific
code that takes into account the structure of the prime and the way the
field is used in the application. The initial implementation includes a
field for Poly1305 and the fields for X25519/X448 which should also work
for EdDSA.
The benefits of using this library are that it is much more efficient
than using similar operations in BigInteger. Also, many operations are
branch-free, making them suitable for use in a side-channel resistant
implementation that does not branch on secrets.
To provide some context, I have attached a code snippet describing how
this library can be used. The snippet is the constant-time Montgomery
ladder from my X25519/X448 implementation, which I expect to be out for
review soon. X25519/X448 only uses standard arithmetic operations, and
the more unusual features (e.g. add modulo a power of 2) are needed by
Poly1305.
The field arithmetic (for all fields) is implemented using a 32-bit
representation similar to the one described in the Ed448 paper[1] (in
the "Implementation on 32-bit platforms" section). Though my
implementation uses signed limbs, and grade-school multiplication
instead of Karatsuba. The argument for correctness is essentially the
same for all three fields: the magnitude of each 64-bit limb is at most
2^(k-1) after reduction, except for the last limb which may have a
magnitude of up to 2^k. The values of k are between 26 to 28 (depending
on the field), and we can calculate that the maximum magnitude for any
limb during an add-multiply-carry-reduce sequence is always less than
2^63. Therefore, no overflow occurs and all operations are correct.
Process note: this enhancement is part of JEP 324 (Key Agreement with
Curve25519 and Curve448). When this code review is complete, nothing
will happen until all other work for this JEP is complete, and the JEP
is accepted as part of some release. This means that this code will be
pushed to the repo along with the X25519/X448 code that uses it.
[1] https://eprint.iacr.org/2015/625.pdf
-------------- next part --------------
private IntegerModuloP_Base pointMultiply(byte[] k, IntegerModuloP u){
IntegerModuloP x_1 = u;
MutableIntegerModuloP x_2 = one.mutable();
MutableIntegerModuloP z_2 = zero.mutable();
MutableIntegerModuloP x_3 = u.mutable();
MutableIntegerModuloP z_3 = one.mutable();
int swap = 0;
// Variables below are reused to avoid unnecessary allocation
// They will be assigned in the loop, so initial value doesn't matter
MutableIntegerModuloP m1 = zero.mutable();
MutableIntegerModuloP DA = zero.mutable();
MutableIntegerModuloP E = zero.mutable();
MutableIntegerModuloP a24_times_E = zero.mutable();
for(int t = params.getBits() - 1; t >= 0; t--){
int k_t = bitAt(k, t);
swap = swap ^ k_t;
x_2.conditionalSwapWith(x_3, swap);
z_2.conditionalSwapWith(z_3, swap);
swap = k_t;
// A(m1) = x_2 + z_2
m1.setValue(x_2).setSum(z_2);
// D = x_3 - z_3
// DA = D * A(m1)
DA.setValue(x_3).setDifference(z_3).setProduct(m1);
// AA(m1) = A(m1)^2
m1.setSquare();
// B(x_2) = x_2 - z_2
x_2.setDifference(z_2);
// C = x_3 + z_3
// CB(x_3) = C * B(x_2)
x_3.setSum(z_3).setProduct(x_2);
// BB(x_2) = B^2
x_2.setSquare();
// E = AA(m1) - BB(x_2)
E.setValue(m1).setDifference(x_2);
// compute a24 * E using SmallValue
a24_times_E.setValue(E);
a24_times_E.setProduct(a24);
// assign results to x_3, z_3, x_2, z_2
// x_2 = AA(m1) * BB
x_2.setProduct(m1);
// z_2 = E * (AA(m1) + a24 * E)
z_2.setValue(m1).setSum(a24_times_E).setProduct(E);
// z_3 = x_1*(DA - CB(x_3))^2
z_3.setValue(DA).setDifference(x_3).setSquare().setProduct(x_1);
// x_3 = (CB(x_3) + DA)^2
x_3.setSum(DA).setSquare();
}
x_2.conditionalSwapWith(x_3, swap);
z_2.conditionalSwapWith(z_3, swap);
// return (x_2 * z_2^(p - 2))
return x_2.setProduct(z_2.multiplicativeInverse());
}
More information about the security-dev
mailing list