RFR: 8294731: Improve multiplicative inverse for secp256r1 implementation [v3]

Wed Nov 2 14:39:00 UTC 2022

On Sat, 8 Oct 2022 15:34:57 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote:

>> Hi,
>> 
>> May I have this patch reviewed?
>> 
>> This is one of a few steps to improve the EC performance. The multiplicative inverse implementation could be improved for better performance. 
>> 
>> For secp256r1 prime p, the current  multiplicative inverse impl needs 256 square and 128 multiplication.  With the path, the operation needs 256 square and 13 multiplication.
>> 
>> For secp256r1 order n, the current  multiplicative inverse impl needs 256 square and 169 multiplication. With the patch, the operation needs  256 square and 43 multiplication.
>> 
>> In EC operations, square operation is much faster than multiplication.  Decreasing multiplication numbers could speed up the multiplicative inverse significantly.
>> 
>> The benchmark for ECDSA Signature is checked in order to see if the performance improvement is visible.  Here are the benchmark numbers before the patch applied:
>> 
>> Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
>> Signatures.sign               64  thrpt   15  1412.644 ±  5.529  ops/s
>> Signatures.sign              512  thrpt   15  1407.711 ± 14.118  ops/s
>> Signatures.sign             2048  thrpt   15  1415.674 ±  6.965  ops/s
>> Signatures.sign            16384  thrpt   15  1395.582 ± 12.689  ops/s
>> 
>> 
>> And the following are the benchmarking after the patch applied.
>> 
>> Signatures.sign               64  thrpt   15  1484.404 ± 10.705  ops/s
>> Signatures.sign              512  thrpt   15  1486.563 ±  7.514  ops/s
>> Signatures.sign             2048  thrpt   15  1479.866 ± 15.028  ops/s
>> Signatures.sign            16384  thrpt   15  1469.789 ±  3.844  ops/s
>> 
>> 
>> The performance improvement of the patch is about 5% for ECDSA signature.  It looks like the improvement is no significant enough for now.  But it may be 2+ times more in numbers when the scalar multiplication implementation is improved in a follow-up enhancement in another pull request.
>> 
>> For comparing, here is the benchmarking numbers by using BigInteger.modInverse();
>> 
>> Benchmark        (messageLength)   Mode  Cnt     Score     Error  Units
>> Signatures.sign               64  thrpt   15  1395.628 ± 180.649  ops/s
>> Signatures.sign              512  thrpt   15  1510.590 ±   9.826  ops/s
>> Signatures.sign             2048  thrpt   15  1514.282 ±   3.382  ops/s
>> Signatures.sign            16384  thrpt   15  1497.325 ±   6.854  ops/s
>> 
>> and numbers for using BigInteger.modPow():
>> 
>> Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
>> Signatures.sign               64  thrpt   15  1486.764 ± 17.908  ops/s
>> Signatures.sign              512  thrpt   15  1494.801 ± 14.072  ops/s
>> Signatures.sign             2048  thrpt   15  1500.170 ±  6.998  ops/s
>> Signatures.sign            16384  thrpt   15  1434.192 ± 49.269  ops/s
>> 
>> 
>> Enhancement for other curves will be considered in separate pull requests.
>> 
>> Thanks,
>> Xuelei
>
> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   more improvement

src/java.base/share/classes/sun/security/util/math/IntegerModuloP.java line 410:

> 408:                 // as it hapeens to be 4. For bit set other than 4 bits, for
> 409:                 // example, 3 bits set (0x8), the value should be added back.
> 410:                 // d.setProduct(w[2]);

I think you can remove this comment, or at least fix your typos: "for-lopp" -> "for loop",  "hapeens" -> "happens",  "(0x8)" -> "(0x7)". 
You can say something like.: ' "if(k != -1) d.setProduct(w[k]);" is not necessary here as k is -1 at the end of the loop for this exponent'

-------------

PR: https://git.openjdk.org/jdk/pull/10544