RFR: 8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs
Srinivas Vamsi Parasa
sparasa at openjdk.org
Wed Jul 2 23:32:41 UTC 2025
On Wed, 2 Jul 2025 17:44:34 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>>> For a cleaner interface, I think we can also maintain a RAII style APXPushPopPairTracker ...
>>
>> Using the suggested code as a base, Vamsi and I tinkered with the idea some more! Here is what we came up with. This also tracks the correct order of registers being pushed/poped.. (haven't compiled it, so might have some syntax bugs).
>>
>> @dholmes-ora would you mind sharing your opinion? We seem to be making things more complicated, but hopefully in a good way?
>>
>> Also included a sample usage in a stub.
>>
>>
>> #define __ _masm->
>>
>> class PushPopTracker {
>> private:
>> int _counter;
>> MacroAssembler *_masm;
>> const int REGS = 32; // Increase as needed
>> int regs[REGS];
>> public:
>> PushPopTracker(MacroAssembler *_masm) : _counter(0), _masm(_masm) {}
>> ~PushPopTracker() {
>> assert(_counter == 0, "Push/pop pair mismatch");
>> }
>>
>> void push(Register reg) {
>> assert(_counter<REGS, "Push/pop overflow");
>> regs[_counter++] = reg.encoding();
>> if (VM_Version::supports_apx_f()) {
>> __ pushp(reg);
>> } else {
>> __ push(reg);
>> }
>> }
>> void pop(Register reg) {
>> assert(_counter>0, "Push/pop underflow");
>> assert(regs[_counter] == reg.encoding(), "Push/pop pair mismatch: %d != %d", regs[_counter], reg.encoding());
>> _counter--;
>> if (VM_Version::supports_apx_f()) {
>> __ popp(reg);
>> } else {
>> __ pop(reg);
>> }
>> }
>> }
>>
>> address StubGenerator::generate_intpoly_montgomeryMult_P256() {
>> __ align(CodeEntryAlignment);
>> /*...*/
>> address start = __ pc();
>> __ enter();
>> PushPopTracker s(_masm);
>> s.push(r12); //1
>> s.push(r13); //2
>> s.push(r14); //3
>> #ifdef _WIN64
>> s.push(rsi); //4
>> s.push(rdi); //5
>> #endif
>> s.push(rbp); //6
>> __ movq(rbp, rsp);
>> __ andq(rsp, -32);
>> __ subptr(rsp, 32);
>> // Register Map
>> const Register aLimbs = c_rarg0; // c_rarg0: rdi | rcx
>> const Register bLimbs = rsi; // c_rarg1: rsi | rdx
>> const Register rLimbs = r8; // c_rarg2: rdx | r8
>> const Register tmp1 = r9;
>> const Register tmp2 = r10;
>> /*...*/
>> __ movq(rsp, rbp);
>> s.pop(rbp); //5
>> #ifdef _WIN64
>> s.pop(rdi); //4
>> s.pop(rsi); //3
>> #endif
>> s.pop(r14); //2
>> s.pop(r13); //1
>> s.pop(r12); //0
>> __ leave();
>> __ ret(0);
>> return start;
>> }
>
> @vamsi-parasa, It's better to make this as a subclass of MacroAssembler in src/hotspot/cpu/x86/macroAssembler_x86.hpp and pass Tracker as an argument to push / pop for a cleaner interface.
Hi Jatin (@jatin-bhateja) and Vlad (@vpaprotsk),
There's one more issue to be considered. The C++ PushPopTracker code will be run during the stub generation time. There are code bocks which do a single push onto the stack but due to multiple exit paths, there will be multiple pops as illustrated below. Will this reference counting approach not fail in such a scenario as the stub code is generated all at once during the stub generation phase?
#begin stack frame
push(r21)
#exit condition 1
pop(r21)
# exit condition 2
pop(r21)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25889#discussion_r2181146890
More information about the hotspot-compiler-dev
mailing list