RFR: 8373697: AArch64: Remove deoptimization stub code for nmethods with small stack frames

Tue Jan 13 11:20:29 UTC 2026

On Sat, 20 Dec 2025 21:53:48 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Removal of deoptimization stub codes improves density of compiled code.
>> It is possible at least for methods with relatively small stack frames.
>
> That's a good idea.  But instead of storing the nmethod* into r8, how about storing the original PC?  Then we don't need to reserve that stack slot in the frame anymore.  However, I'm not sure this trick will work for all compiled frames.  Do we always save r8 on compiled --> compiled calls, or only for compiled --> runtime calls?

Hi @dean-long, @theRealAph,

> Here's another possible idea: if the call return is a PostCallNop, we can change the return address so we return into the middle of the NOP. On RISC that might be enough to cause an unaligned address trap. On x64 or other architectures, we might need an actual embedded trap instruction.

I initially chose to not take this path because of latency concerns.
For example, in a scenario when a set of hot methods is being executed by many threads/cores and gets deoptimized, a lot of nearly simultaneous traps might occur - potentially resulting in delays due to contention inside the kernel.
If we accept a higher cost of deoptimization events, this approach seems to be the simplest option.

> Surely we don't need any of this complication. When we patch the CodeBlob frame's return address with the PC of the deopt handler, we could also store nmethod* into the slot for r8 in the same CodeBlob frame. When that CodeBlob returns, the nmethod* is in r8, and the deopt handler can read it there.

As far as I understand, there is no dedicated slot for x8 register with fixed offset within the compiled frames
There can be a few adjacent compiled frames - all of which can be patched for deoptimization - so we don't necessarily have a non-compiled frame available for storing the nmethod*.

The original PC slot currently contains a similar value: the original return address within the nmethod.
However, the offset to the original PC slot depends on the nmethod - so we need to identify the nmethod before locating the slot.
In my understanding, any spill slot would have an offset depending on the nmethod, because the first N slots in each frame are reserved for arguments.

Example compiled stack frame layout:

#r020 c_rarg1:c_rarg1   : parm 0: rawptr:BotPTR
# -- Old r31_sp -- Framesize: 112 --
#r223 r31_sp+108: in_preserve
#r222 r31_sp+104: return address (to the caller)
#r221 r31_sp+100: in_preserve
#r220 r31_sp+96: saved fp register
#r219 r31_sp+92: in_preserve
#r218 r31_sp+88: in_preserve
#r217 r31_sp+84: Fixed slot 1
#r216 r31_sp+80: Fixed slot 0 (original PC slot)
#r243 r31_sp+76: spill
#r242 r31_sp+72: spill
…
#r225 r31_sp+ 8: spill
#r225 r31_sp+ 4: spill
#r224 r31_sp+ 0: outgoing argument
----------------------------
      r31_sp- 8: return address (to the current method) - the patched item, with the original value stored in the fixed slot above

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28857#issuecomment-3743781438