RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64

Sandhya Viswanathan sviswanathan at openjdk.org
Tue Mar 4 00:02:08 UTC 2025


On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:

> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain)
> 
> Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain.
> 
> Before (no AVX512)
> 
> Benchmark                        (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt      Score     Error  Units
> SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256              thrpt   40   3720.589 ±  17.879  ops/s
> SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256              thrpt   40   3605.940 ±  15.807  ops/s
> SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256              thrpt   40   1076.502 ±   4.190  ops/s
> SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256              thrpt   40   1069.624 ±   2.484  ops/s
> Benchmark                             (algorithm)  (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt     Score   Error  Units
> KeyAgreementBench.EC.generateSecret          ECDH          256              EC              thrpt   40   830.448 ± 2.285  ops/s
> 
> After (with AVX2)
> 
> Benchmark                        (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt      Score     Error  Units
> SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256              thrpt   40   6000.496 ±  39.923  ops/s
> SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256              thrpt   40   5739.878 ±  34.838  ops/s
> SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256              thrpt   40   1942.437 ±  12.179  ops/s
> SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256              thrpt   40   1921.770 ±   8.992  ops/s
> Benchmark                             (algorithm)  (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt     Score   Error  Units
> KeyAgreementBench.EC.generateSecret          ECDH          256              EC              thrpt   40  1399.761 ± 6.238  ops/s
> 
> 
> Before (with AVX512):
> 
> Benchmark                        (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt       Score     Error  Units
> SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256              thrpt   40    9621.950 ±  27.260  ops/s
> SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256              thrpt   40    8975.654 ±  26.707  ops/s
> SignatureBench.ECDSA.verify  SHA256withECDSA        102...

src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 2:

> 1: /*
> 2:  * Copyright (c) 2025, Intel Corporation. All rights reserved.

This should be:
Copyright (c) 2024, 2025, Intel Corporation. All rights reserved.

Also please check that the copyright year is appropriately updated in all the files.

src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 259:

> 257:     }
> 258:     __ vpaddq(Acc1, Acc1, Carry, Assembler::AVX_256bit);
> 259:   }

A comment here on what this block is doing would help.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1978309062
PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1978398909


More information about the hotspot-dev mailing list