RFR: 8355216: Accelerate P-256 arithmetic on aarch64
Andrew Haley
aph at openjdk.org
Mon Dec 1 21:30:28 UTC 2025
On Tue, 18 Nov 2025 19:00:21 GMT, Ben Perez <bperez at openjdk.org> wrote:
> Is there a reason hotspot doesn't leave `r9` open for use as a caller saved local variable like in the ARM docs https://developer.arm.com/documentation/102374/0103/Procedure-Call-Standard. Either way will fix.
Yes, because a ton of convenience macros in `MacroAssembler` use it. HotSpot internally doesn't use the APCS.
>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7169:
>>
>>> 7167: Register mul_tmp = r14;
>>> 7168: Register n = r15;
>>> 7169:
>>
>> Here, you could do something like
>>
>>
>> RegSet scratch = RegSet::range(r3, r28) - rscratch1 - rscratch2;
>>
>> {
>> auto r_it = scratch.begin();
>> Register
>> c_ptr = *r_it++,
>> a_i = *r_it++,
>> c_idx = *r_it++, //c_idx is not used at the same time as a_i
>> limb_mask_scalar = *r_it++,
>> b_j = *r_it++,
>> mod_j = *r_it++,
>> mod_ptr = *r_it++,
>> mul_tmp = *r_it++,
>> n = *r_it++;
>> ...
>> }
>>
>>
>>
>> Note that a RegSet iterator doesn't affect the RegSet it was created from, so once this block has ended you can allocate again from the set of scratch registers.
>
> Is there by any chance documentation for `RegSet` that I can reference while making these changes?
I could talk you through it, or give you a few more examples. It's easy to use once you get used to it.
> thanks for the suggestion! Does using `T16B` improve performance? Similarly, should this be applied to `EOR` as well?
Yes. According to the ARM, only 8B and 16B forms are the official names.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2541919496
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2477261216
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2476966812
More information about the hotspot-dev
mailing list