RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64

Volodymyr Paprotski vpaprotski at openjdk.org
Fri Feb 21 20:58:27 UTC 2025


Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain)

Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain.

Before (no AVX512)

Benchmark                        (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt      Score     Error  Units
SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256              thrpt   40   3720.589 ±  17.879  ops/s
SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256              thrpt   40   3605.940 ±  15.807  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256              thrpt   40   1076.502 ±   4.190  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256              thrpt   40   1069.624 ±   2.484  ops/s
Benchmark                             (algorithm)  (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt     Score   Error  Units
KeyAgreementBench.EC.generateSecret          ECDH          256              EC              thrpt   40   830.448 ± 2.285  ops/s

After (with AVX2)

Benchmark                        (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt      Score     Error  Units
SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256              thrpt   40   6000.496 ±  39.923  ops/s
SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256              thrpt   40   5739.878 ±  34.838  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256              thrpt   40   1942.437 ±  12.179  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256              thrpt   40   1921.770 ±   8.992  ops/s
Benchmark                             (algorithm)  (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt     Score   Error  Units
KeyAgreementBench.EC.generateSecret          ECDH          256              EC              thrpt   40  1399.761 ± 6.238  ops/s


Before (with AVX512):

Benchmark                        (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt       Score     Error  Units
SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256              thrpt   40    9621.950 ±  27.260  ops/s
SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256              thrpt   40    8975.654 ±  26.707  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256              thrpt   40    3112.945 ±  12.930  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256              thrpt   40    3039.183 ±  12.362  ops/s
Benchmark                             (algorithm)  (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
KeyAgreementBench.EC.generateSecret          ECDH          256              EC              thrpt   40  2248.987 ±  7.427  ops/s

After (with AVX512):

Benchmark                        (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt       Score     Error  Units
SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256              thrpt   40    9815.713 ±  23.455  ops/s
SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256              thrpt   40    9136.786 ±  27.747  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256              thrpt   40    3167.702 ±  13.331  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256              thrpt   40    3090.053 ±  12.925  ops/s
Benchmark                             (algorithm)  (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
KeyAgreementBench.EC.generateSecret          ECDH          256              EC              thrpt   40  2278.031 ±  6.971  ops/s

-------------

Commit messages:
 - whitespace
 - split up ASM and Math changes

Changes: https://git.openjdk.org/jdk/pull/23719/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8350459
  Stats: 625 lines in 9 files changed: 525 ins; 15 del; 85 mod
  Patch: https://git.openjdk.org/jdk/pull/23719.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719

PR: https://git.openjdk.org/jdk/pull/23719


More information about the hotspot-dev mailing list