RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9]
Sandhya Viswanathan
sviswanathan at openjdk.org
Thu May 16 23:38:08 UTC 2024
On Fri, 10 May 2024 00:19:32 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:
>> Performance. Before:
>>
>> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
>> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ± 6.491 ops/s
>> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ± 4.954 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ± 36.979 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ± 45.487 ops/s
>> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ± 26.584 ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ± 23.547 ops/s
>> Benchmark (isMontBench) Mode Cnt Score Error Units
>> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ± 10.970 ops/s
>>
>> Performance, no intrinsic:
>>
>> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units
>> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ± 42.420 ops/s
>> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ± 133.566 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ± 54.071 ops/s
>> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ± 35.920 ops/s
>> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ± 29.858 ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ± 28.722 ops/s
>> Benchmark (isMontBench) Mode Cnt Score Error Units
>> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57...
>
> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>
> whitespace
src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 168:
> 166: XMMRegister broadcast5 = xmm24;
> 167: KRegister limb0 = k1;
> 168: KRegister limb5 = k2;
limb5 and select are not being used anymore.
src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 185:
> 183: __ evmovdquq(modulus, allLimbs, ExternalAddress(modulus_p256()), false, Assembler::AVX_512bit, rscratch);
> 184:
> 185: // A = load(*aLimbs)
A little bit more description in comments on what the load step involves would be helpful. e.g. Load upper 4 limbs, shift left by 1 limb using perm, or in the lowest limb.
src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 270:
> 268: __ push(r14);
> 269: __ push(r15);
> 270:
No need to save/restore rbx, r12, r14, r15. Only r13 is used as temp in montgomeryMultiply(aLimbs, bLimbs, rLimbs). That too could be easily changed to r8.
src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 286:
> 284: __ mov(aLimbs, c_rarg0);
> 285: __ mov(bLimbs, c_rarg1);
> 286: __ mov(rLimbs, c_rarg2);
We could directly call montgomeryMultiply(c_rarg0, c_rarg1, c_rarg2) then these moves are not necessary.
src/hotspot/cpu/x86/vm_version_x86.cpp line 1370:
> 1368:
> 1369: #ifdef _LP64
> 1370: if (supports_avx512ifma() && supports_avx512vlbw() && MaxVectorSize >= 64) {
No need to tie the intrinsic to MaxVectorSize setting.
src/hotspot/share/opto/library_call.cpp line 7564:
> 7562:
> 7563: if (!stubAddr) return false;
> 7564: if (stopped()) return true;
Line 7564 seems redundant here as there is no range check or anything like that before this.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604169603
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604141586
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604174141
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604175443
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1603792252
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1603865712
More information about the security-dev
mailing list