RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9]
Volodymyr Paprotski
duke at openjdk.org
Fri May 17 17:10:09 UTC 2024
On Thu, 16 May 2024 23:21:36 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>>
>> whitespace
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 168:
>
>> 166: XMMRegister broadcast5 = xmm24;
>> 167: KRegister limb0 = k1;
>> 168: KRegister limb5 = k2;
>
> limb5 and select are not being used anymore.
Thanks, fixed (and also broadcast5)
> src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 185:
>
>> 183: __ evmovdquq(modulus, allLimbs, ExternalAddress(modulus_p256()), false, Assembler::AVX_512bit, rscratch);
>> 184:
>> 185: // A = load(*aLimbs)
>
> A little bit more description in comments on what the load step involves would be helpful. e.g. Load upper 4 limbs, shift left by 1 limb using perm, or in the lowest limb.
Done
> src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 270:
>
>> 268: __ push(r14);
>> 269: __ push(r15);
>> 270:
>
> No need to save/restore rbx, r12, r14, r15. Only r13 is used as temp in montgomeryMultiply(aLimbs, bLimbs, rLimbs). That too could be easily changed to r8.
Seems I forgot to completely cleanup, thanks! (Originally copied from poly1305 stub)
> src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 286:
>
>> 284: __ mov(aLimbs, c_rarg0);
>> 285: __ mov(bLimbs, c_rarg1);
>> 286: __ mov(rLimbs, c_rarg2);
>
> We could directly call montgomeryMultiply(c_rarg0, c_rarg1, c_rarg2) then these moves are not necessary.
Gave them symbolic names and passed the gpr temp and parameter. vector register map still in the montgomeryMultiply function, but gprs explicitly passed in. 'close enough'?
> src/hotspot/cpu/x86/vm_version_x86.cpp line 1370:
>
>> 1368:
>> 1369: #ifdef _LP64
>> 1370: if (supports_avx512ifma() && supports_avx512vlbw() && MaxVectorSize >= 64) {
>
> No need to tie the intrinsic to MaxVectorSize setting.
Done
> src/hotspot/share/opto/library_call.cpp line 7564:
>
>> 7562:
>> 7563: if (!stubAddr) return false;
>> 7564: if (stopped()) return true;
>
> Line 7564 seems redundant here as there is no range check or anything like that before this.
Oh. That is what that is for... I thought it was some soft of 'VM quitting' short-circuit. Removed.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328906
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328960
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328859
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328829
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605329040
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328995
More information about the security-dev
mailing list