RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4]
Anthony Scarpino
ascarpino at openjdk.org
Mon Mar 10 22:51:58 UTC 2025
On Wed, 5 Mar 2025 23:03:23 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:
>> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain)
>>
>> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain.
>>
>> Before (no AVX512)
>>
>> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
>> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ± 17.879 ops/s
>> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ± 15.807 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ± 4.190 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ± 2.484 ops/s
>> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
>> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ± 2.285 ops/s
>>
>> After (with AVX2)
>>
>> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
>> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ± 39.923 ops/s
>> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ± 34.838 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ± 12.179 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ± 8.992 ops/s
>> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
>> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ± 6.238 ops/s
>>
>>
>> Before (with AVX512):
>>
>> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
>> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ± 27.260 ops/s
>> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ± 26.707 o...
>
> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>
> more comment improvements
test/jdk/com/sun/security/util/math/intpoly/MontgomeryPolynomialFuzzTest.java line 30:
> 28: import sun.security.util.math.intpoly.*;
> 29:
> 30: /*
It is strange that there are two copies of the `@test` block. Can you please remove one of them, unless you are seeing a difference that I do not
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1988122873
More information about the hotspot-dev
mailing list