RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]
Jatin Bhateja
jbhateja at openjdk.org
Tue Apr 16 02:31:02 UTC 2024
On Mon, 15 Apr 2024 22:04:14 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 394:
>>
>>> 392: __ lea(aLimbs, Address(aLimbs,8));
>>> 393: __ lea(bLimbs, Address(bLimbs,8));
>>> 394: __ jmp(L_DefaultLoop);
>>
>> Both sub and cmp are flag affecting instructions and are macro-fusible.
>> By doing a loop rotation i.e. moving the length <= 0 check outside the loop and pushing the loop exit check at bottom you can save additional compare checks.
>
> Per-above, this is a switch statement (`UNLIKELY`) fallback. I can still add alignment and loop rotation, but being a fallback figured its more important to keep it small&readable...
It's all part of intrinsic, no harm in polishing it.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1566630667
More information about the core-libs-dev
mailing list