RFR: 8295010: EC limbs addition and subtraction limit
Daniel Jeliński
djelinski at openjdk.org
Mon Oct 10 12:13:57 UTC 2022
On Sun, 9 Oct 2022 06:53:19 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote:
> Hi,
>
> May I have this update reviewed? With this update, the result will be always reduced in EC limbs addition and subtraction operations in the JDK implementation.
>
> In the current implementation, the EC limbs addition and subtraction result is not reduced before the value is returned. This behavior is different from multiplication and square. To get it right, a maximum addition limit is introduced for EC limbs addition and subtraction. If the limit is reached, an exception will be thrown so as to indicate an EC implementation problem. The exception is not recoverable from the viewpoint of an application.
>
> The design is error prone as the EC implementation code must be carefully checked so that the limit cannot reach. If a reach is noticed, a reduce operation would have to be hard-coded additionally. In the FieldGen.java implementation, the limit can only be 1 or 2 as the implementation code is only able to handle 2. But the FieldGen.java does not really check if 1 or 2 really works for the specific filed generation parameters.
>
> The design impact the performance as well. Because of this limit, the maximum limb size is 28 bits for 2 max adding limit. Otherwise there are integer (long) overflow issues. For example for 256 bits curves, 10 limbs are required for 28-bit limb size; and 9 limbs for 29-bit size. By reducing 1 limbs from 10 to 9, the Signature performance could get improved by 20%.
>
> In the IntegerPolynomial class description, it is said "All IntegerPolynomial implementations allow at most one addition before multiplication. Additions after that will result in an ArithmeticException." It's too strict to follow without exam the code very carefully. Indeed, the implementation does not really follow the spec, and 2 addition may be allowed.
>
> It would be nice if there is no addition limit, and then we are free from these issues. It is doable by reducing the limbs in the adding and subtraction operations, as other operations as multiplication. And then the code related to the limit is not necessary any longer. The code can do any many additions as required.
>
> The benchmark for ECDSA Signature is checked in order to see how much the performance could be impacted. Here are the benchmark numbers before the patch applied:
>
> Benchmark (messageLength) Mode Cnt Score Error Units
> Signatures.sign 64 thrpt 15 1414.463 ± 9.616 ops/s
> Signatures.sign 512 thrpt 15 1409.023 ± 14.636 ops/s
> Signatures.sign 2048 thrpt 15 1414.488 ± 4.728 ops/s
> Signatures.sign 16384 thrpt 15 1385.685 ± 4.476 ops/s
>
>
> and here are the numbers with this patch applied:
>
> Benchmark (messageLength) Mode Cnt Score Error Units
> Signatures.sign 64 thrpt 15 1293.658 ± 8.358 ops/s
> Signatures.sign 512 thrpt 15 1298.404 ± 4.344 ops/s
> Signatures.sign 2048 thrpt 15 1289.732 ± 19.748 ops/s
> Signatures.sign 16384 thrpt 15 1278.980 ± 13.483 ops/s
>
>
> From the numbers, it looks like the performance impact is about 10%. However, the performance impact could get rewarded by using less limbs. For examples for secp256r1, I updated the limbs from [10 to 9 for integer operations](https://github.com/openjdk/jdk/pull/10398), and run the benchmark together with this patch, I could see the performance improvement as follow:
>
> Benchmark (messageLength) Mode Cnt Score Error Units
> Signatures.sign 64 thrpt 15 1578.568 ± 11.000 ops/s
> Signatures.sign 512 thrpt 15 1596.058 ± 4.184 ops/s
> Signatures.sign 2048 thrpt 15 1591.519 ± 6.444 ops/s
> Signatures.sign 16384 thrpt 15 1563.480 ± 6.837 ops/s
> ```
>
> The following are numbers for secp256r1 key generation, without this patch:
>
> Benchmark Mode Cnt Score Error Units
> KeyPairGenerators.keyPairGen thrpt 15 1531.026 ± 9.869 ops/s
>
>
> With this patch:
>
> Benchmark Mode Cnt Score Error Units
> KeyPairGenerators.keyPairGen thrpt 15 1367.162 ± 50.322 ops/s
>
>
> with improved limbs (from 10 to 9):
>
> Benchmark Mode Cnt Score Error Units
> KeyPairGenerators.keyPairGen thrpt 15 1713.097 ± 8.003 ops/s
>
> If considering this PR together with [PR for JDK-8294248](https://github.com/openjdk/jdk/pull/10398), performance improvement could be expected for EC crypto primitives.
>
> Thanks,
> Xuelei
That's a pretty significant performance degradation (8%?), and as far as I could tell, it will affect all EC and XEC implementations. On the other hand, #10398 only improves performance of P256. I'm not sure that's a tradeoff we want to make.
IMO working with setReduced is not that bad; we always invoke the same set of operations in the same sequence, so if we forget to reduce a number before multiplying, every EC operation will fail. Could we introduce just enough reduce / setReduced calls to make P256 work reliably with 9 limbs? What impact would it have on P384?
-------------
PR: https://git.openjdk.org/jdk/pull/10624
More information about the build-dev
mailing list