RFR: 8295010: EC limbs addition and subtraction limit

Xue-Lei Andrew Fan xuelei at openjdk.org
Mon Oct 10 14:57:56 UTC 2022


On Sun, 9 Oct 2022 06:53:19 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote:

> Hi,
> 
> May I have this update reviewed?  With this update, the result will be always reduced in EC limbs addition and subtraction operations in the JDK implementation.
> 
> In the current implementation, the EC limbs addition and subtraction result is not reduced before the value is returned.  This behavior is different from multiplication and square.  To get it right, a maximum addition limit is introduced for EC limbs addition and subtraction.  If the limit is reached, an exception will be thrown so as to indicate an EC implementation problem.  The exception is not recoverable from the viewpoint of an application.
> 
> The design is error prone as the EC implementation code must be carefully checked so that the limit cannot reach. If a reach is noticed, a reduce operation would have to be hard-coded additionally. In the FieldGen.java implementation, the limit can only be 1 or 2 as the implementation code is only able to handle 2.  But the FieldGen.java does not really check if 1 or 2 really works for the specific filed generation parameters.
> 
> The design impact the performance as well. Because of this limit, the maximum limb size is 28 bits for 2 max adding limit. Otherwise there are integer (long) overflow issues. For example for 256 bits curves, 10 limbs are required for 28-bit limb size; and 9 limbs for 29-bit size. By reducing 1 limbs from 10 to 9, the Signature performance could get improved by 20%.
> 
> In the IntegerPolynomial class description, it is said "All IntegerPolynomial implementations allow at most one addition before multiplication. Additions after that will result in an ArithmeticException." It's too strict to follow without exam the code very carefully. Indeed, the implementation does not really follow the spec, and 2 addition may be allowed.
> 
> It would be nice if there is no addition limit, and then we are free from these issues. It is doable by reducing the limbs in the adding and subtraction operations, as other operations as multiplication. And then the code related to the limit is not necessary any longer. The code can do any many additions as required.
> 
> The benchmark for ECDSA Signature is checked in order to see how much the performance could be impacted. Here are the benchmark numbers before the patch applied:
> 
> Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
> Signatures.sign               64  thrpt   15  1414.463 ±  9.616  ops/s
> Signatures.sign              512  thrpt   15  1409.023 ± 14.636  ops/s
> Signatures.sign             2048  thrpt   15  1414.488 ±  4.728  ops/s
> Signatures.sign            16384  thrpt   15  1385.685 ±  4.476  ops/s
> 
> 
> and here are the numbers with this patch applied:
> 
> Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
> Signatures.sign               64  thrpt   15  1293.658 ±  8.358  ops/s
> Signatures.sign              512  thrpt   15  1298.404 ±  4.344  ops/s
> Signatures.sign             2048  thrpt   15  1289.732 ± 19.748  ops/s
> Signatures.sign            16384  thrpt   15  1278.980 ± 13.483  ops/s
> 
> 
> From the numbers, it looks like the performance impact is about 10%.  However, the performance impact could get rewarded by using less limbs.  For examples for secp256r1, I updated the limbs from [10 to 9 for integer operations](https://github.com/openjdk/jdk/pull/10398), and run the benchmark together with this patch, I could see the performance improvement as follow:
> 
> Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
> Signatures.sign               64  thrpt   15  1578.568 ± 11.000  ops/s
> Signatures.sign              512  thrpt   15  1596.058 ±  4.184  ops/s
> Signatures.sign             2048  thrpt   15  1591.519 ±  6.444  ops/s
> Signatures.sign            16384  thrpt   15  1563.480 ±  6.837  ops/s
> ``` 
> 
> The following are numbers for secp256r1 key generation, without this patch:
> 
> Benchmark                      Mode  Cnt     Score   Error  Units
> KeyPairGenerators.keyPairGen  thrpt   15  1531.026 ± 9.869  ops/s
> 
> 
> With this patch:
> 
> Benchmark                      Mode  Cnt     Score    Error  Units
> KeyPairGenerators.keyPairGen  thrpt   15  1367.162 ± 50.322  ops/s
> 
> 
> with improved limbs (from 10 to 9):
> 
> Benchmark                      Mode  Cnt     Score    Error  Units
> KeyPairGenerators.keyPairGen  thrpt   15  1713.097 ± 8.003  ops/s
> 
> If considering this PR together with [PR for JDK-8294248](https://github.com/openjdk/jdk/pull/10398), performance improvement could be expected for EC crypto primitives.
> 
> Thanks,
> Xuelei

Thank you for looking into the PR.

> On the other hand, #10398 only improves performance of P256. I'm not sure that's a tradeoff we want to make.

If I get it right, all supported curves could be improved by 1 limb in implementation.

> Could we introduce just enough reduce / setReduced calls to make P256 work reliably with 9 limbs?

Yes, it is doable as well by limit the maxAdd to 1 in the implementation.  This approach needs to adjust the code accordingly, just as what we did to make maxAdd 2 works.   The performance impact is limited.  This approach has better performance (about 20%) if reducing the limbs to 9 for P256.

> IMO working with setReduced is not that bad;

Hm, I may be focus too much on the error-prone design.  Let me check what it could be by limit the maxAdd to 1 always.

-------------

PR: https://git.openjdk.org/jdk/pull/10624


More information about the security-dev mailing list