RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

Jatin Bhateja jbhateja at openjdk.org
Fri Apr 5 09:20:10 UTC 2024


On Tue, 2 Apr 2024 19:19:59 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:

>> Performance. Before:
>> 
>> Benchmark                        (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt     Score    Error  Units
>> SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256              thrpt    3  6443.934 ±  6.491  ops/s
>> SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256              thrpt    3  6152.979 ±  4.954  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256              thrpt    3  1895.410 ± 36.979  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256              thrpt    3  1878.955 ± 45.487  ops/s
>> Benchmark                                            (algorithm)  (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret          ECDH          256              EC              thrpt    3  1357.810 ± 26.584  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret         ECDH          256              EC              thrpt    3  1352.119 ± 23.547  ops/s
>> Benchmark                          (isMontBench)   Mode  Cnt     Score    Error  Units
>> PolynomialP256Bench.benchMultiply          false  thrpt    3  1746.126 ± 10.970  ops/s
>> 
>> Performance, no intrinsic:
>> 
>> Benchmark                        (algorithm)  (dataSize)  (keyLength)  (provider)   Mode  Cnt     Score     Error  Units
>> SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256              thrpt    3  6529.839 ±  42.420  ops/s
>> SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256              thrpt    3  6199.747 ± 133.566  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256              thrpt    3  1973.676 ±  54.071  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256              thrpt    3  1932.127 ±  35.920  ops/s
>> Benchmark                                            (algorithm)  (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret          ECDH          256              EC              thrpt    3  1355.788 ± 29.858  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret         ECDH          256              EC              thrpt    3  1346.523 ± 28.722  ops/s
>> Benchmark                          (isMontBench)   Mode  Cnt     Score    Error  Units
>> PolynomialP256Bench.benchMultiply           true  thrpt    3  1919.57...
>
> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove use of jdk.crypto.ec

Few early comments.

Please update the copyright year of all the modified files.

You can even consider splitting this into two patches, Java side changes in one and  x86 optimized intrinsic in next one.

src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 39:

> 37: };
> 38: static address modulus_p256() {
> 39:   return (address)MODULUS_P256;

Long constants should have UL suffix.

src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 386:

> 384:   __ jcc(Assembler::equal, L_Length19);
> 385: 
> 386:   // Default copy loop

Please add appropriate loop entry alignment.

src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 394:

> 392:   __ lea(aLimbs, Address(aLimbs,8));
> 393:   __ lea(bLimbs, Address(bLimbs,8));
> 394:   __ jmp(L_DefaultLoop);

Both sub and cmp are flag affecting instructions and are macro-fusible. 
By doing a loop rotation i.e. moving the length <= 0 check outside the loop and pushing the loop exit check at bottom you can save additional compare checks.

-------------

PR Review: https://git.openjdk.org/jdk/pull/18583#pullrequestreview-1981555803
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1553056633
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1552710600
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1553110376


More information about the core-libs-dev mailing list