RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

Jatin Bhateja jbhateja at openjdk.org
Tue Apr 16 02:31:02 UTC 2024


On Mon, 15 Apr 2024 22:04:14 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:

>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 394:
>> 
>>> 392:   __ lea(aLimbs, Address(aLimbs,8));
>>> 393:   __ lea(bLimbs, Address(bLimbs,8));
>>> 394:   __ jmp(L_DefaultLoop);
>> 
>> Both sub and cmp are flag affecting instructions and are macro-fusible. 
>> By doing a loop rotation i.e. moving the length <= 0 check outside the loop and pushing the loop exit check at bottom you can save additional compare checks.
>
> Per-above, this is a switch statement (`UNLIKELY`) fallback. I can still add alignment and loop rotation, but being a fallback figured its more important to keep it small&readable...

It's all part of intrinsic, no harm in polishing it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1566630667


More information about the core-libs-dev mailing list