RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v2]
Sandhya Viswanathan
sviswanathan at openjdk.org
Fri Jun 14 23:48:12 UTC 2024
On Fri, 14 Jun 2024 22:01:44 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:
>> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much.
>>
>> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR.
>>
>> ---
>> XDH.generateSecret performance
>> before Montgomery PR:
>>
>> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
>> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ± 27.230 ops/s
>>
>> after Montgomery PR:
>>
>> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
>> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ± 22.071 ops/s
>>
>> with this PR:
>>
>> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
>> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ± 32.858 ops/s
>>
>> ---
>>
>> P256 performance with/without mult intrinsic:
>>
>> Performance before Montgomery PR:
>>
>> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
>> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ± 7.400 ops/s
>> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ± 5.995 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ± 54.660 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ± 42.438 ops/s
>> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ± 28.514 ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ± 32.050 ops/s
>>
>> Performance in master without mult() intrinsic
>>
>> Benchmark ...
>
> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>
> Improve non-intrinsic p256 performance
src/hotspot/share/opto/runtime.cpp line 1417:
> 1415: // result type needed
> 1416: fields = TypeTuple::fields(1);
> 1417: fields[TypeFunc::Parms + 0] = NULL;
A minor nit: here NULL could be nullptr instead.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/19728#discussion_r1640466077
More information about the security-dev
mailing list