RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v7]
Ben Perez
bperez at openjdk.org
Wed Feb 11 09:39:48 UTC 2026
> An aarch64 implementation of the `MontgomeryIntegerPolynomial256.mult()` method and `IntegerPolynomial.conditionalAssign()`. Since 64-bit multiplication is not supported on Neon and manually performing this operation with 32-bit limbs is slower than with GPRs, a hybrid neon/gpr approach is used. Neon instructions are used to compute intermediate values used in the last two iterations of the main "loop", while the GPRs compute the first few iterations. At the method level this improves performance by ~9% and at the API level roughly 5%.
>
> Performance no intrinsic (Apple M1):
>
> Benchmark (isMontBench) Mode Cnt Score Error Units
> PolynomialP256Bench.benchMultiply true thrpt 8 2427.562 ± 24.923 ops/s
> PolynomialP256Bench.benchMultiply false thrpt 8 1757.495 ± 41.805 ops/s
> PolynomialP256Bench.benchSquare true thrpt 8 2435.202 ± 20.822 ops/s
> PolynomialP256Bench.benchSquare false thrpt 8 2420.390 ± 33.594 ops/s
>
> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 8439.881 ± 29.838 ops/s
> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 7990.614 ± 30.998 ops/s
> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 2677.737 ± 8.400 ops/s
> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 2619.297 ± 9.737 ops/s
>
> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1905.369 ± 3.745 ops/s
>
> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
> KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1903.997 ± 4.092 ops/s
>
>
> Performance with intrinsic (Apple M1):
>
> Benchmark (isMontBench) Mode Cnt Score Error Units
> PolynomialP256Bench.benchMultiply true thrpt 8 2676.599 ± 24.722 ops/s
> PolynomialP256Bench.benchMultiply false thrpt 8 1770.589 ± 2.584 ops/s
> PolynomialP256Bench.benchSqua...
Ben Perez has updated the pull request incrementally with one additional commit since the last revision:
added comments to p256 intrinsics, fixed error message in umullv instruction
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/27946/files
- new: https://git.openjdk.org/jdk/pull/27946/files/05925eaa..e70dc14e
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=27946&range=06
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=27946&range=05-06
Stats: 17 lines in 2 files changed: 11 ins; 0 del; 6 mod
Patch: https://git.openjdk.org/jdk/pull/27946.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/27946/head:pull/27946
PR: https://git.openjdk.org/jdk/pull/27946
More information about the security-dev
mailing list