RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic
Volodymyr Paprotski
duke at openjdk.org
Tue Apr 2 16:10:55 UTC 2024
Performance. Before:
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ± 6.491 ops/s
SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ± 4.954 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ± 36.979 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ± 45.487 ops/s
Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ± 26.584 ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ± 23.547 ops/s
Benchmark (isMontBench) Mode Cnt Score Error Units
PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ± 10.970 ops/s
Performance, no intrinsic:
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ± 42.420 ops/s
SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ± 133.566 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ± 54.071 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ± 35.920 ops/s
Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ± 29.858 ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ± 28.722 ops/s
Benchmark (isMontBench) Mode Cnt Score Error Units
PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ± 10.591 ops/s
Performance, **with intrinsics**
Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 10384.591 ± 65.274 ops/s
SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 9592.912 ± 236.411 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 3479.494 ± 44.578 ops/s
SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 3402.147 ± 26.772 ops/s
Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 2527.678 ± 64.791 ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 2541.258 ± 66.634 ops/s
Benchmark (isMontBench) Mode Cnt Score Error Units
PolynomialP256Bench.benchMultiply true thrpt 3 3021.139 ± 98.289 ops/s
Summary on design (see code for 'ASCII art', references and details on math):
- Added a new `IntegerPolynomial` field (`MontgomeryIntegerPolynomialP256`) with 52-bit limbs
- `getElement(*)/fromMontgomery()` to convert numbers into/out of the field
- `ECOperations` is the primary use of the new field
- flattened some extra deep nested class hierarchy (also in prep for further other field optimizations)
- `forParameters()/multiply()/setSum()` generates numbers in the new field
- `ProjectivePoint/Montgomery{Imm|M}utable.asAffine()` to convert out of the new field
- Added Fuzz Testing and KAT verified with OpenSSL
-------------
Commit messages:
- remove trailing whitespace
- Remeasure performance
- Fix rebase typo
- Address comments from Anas and thorough cleanup
- conditionalAssign intrinsic
- rebase
Changes: https://git.openjdk.org/jdk/pull/18583/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8329538
Stats: 2335 lines in 34 files changed: 2037 ins; 162 del; 136 mod
Patch: https://git.openjdk.org/jdk/pull/18583.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583
PR: https://git.openjdk.org/jdk/pull/18583
More information about the core-libs-dev
mailing list