RFR: 8373697: AArch64: Remove deoptimization stub code for nmethods with small stack frames
Ruben
duke at openjdk.org
Wed Feb 25 22:05:11 UTC 2026
On Tue, 13 Jan 2026 17:45:19 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Removal of deoptimization stub codes improves density of compiled code.
>> It is possible at least for methods with relatively small stack frames.
>
> Try this:
>
>
> diff --git a/src/hotspot/share/runtime/frame.cpp b/src/hotspot/share/runtime/frame.cpp
> index 9fdffa053c4..3c816ee8aad 100644
> --- a/src/hotspot/share/runtime/frame.cpp
> +++ b/src/hotspot/share/runtime/frame.cpp
> @@ -359,6 +359,25 @@ void frame::deoptimize(JavaThread* thread) {
>
> NativePostCallNop* inst = nativePostCallNop_at(pc());
>
> + const ImmutableOopMap* blob_map = thread->last_frame().get_oop_map();
> + blob_map->print();
> +
> + VMReg r7_VMReg = r7->as_VMReg();
> + intptr_t* r8_location = nullptr;
> + for (OopMapStream oms(blob_map); !oms.is_done(); oms.next()) {
> + OopMapValue omv = oms.current();
> + if (omv.type() == OopMapValue::callee_saved_value) {
> + VMReg reg = omv.content_reg();
> + if (reg == r7_VMReg) {
> + r8_location = thread->last_frame().sp()
> + + omv.stack_offset() / VMRegImpl::slots_per_word
> + + 1;
> + tty->print_cr("offset = %d\n", omv.stack_offset() * VMRegImpl::stack_slot_size);
> + break;
> + }
> + }
> + }
> + *r8_location = 0xcafebeefcafebabe;
> // Save the original pc before we patch in the new one
> nm->set_original_pc(this, pc());
> patch_pc(thread, deopt);
>
>
>
> You will find that when you arrive at the deopt handler entry, r8 contains 0xcafebeefcafebabe. You can pass anything you want in the slots for r8 and r9.
>
> This is a bit of a kludge in that we don't have an oopmap entry for r8 so I assume it's in the next entry after r7, but all registers are saved on the stack, and they're saved in their natural order. If needs be we can add an oop map entry for r8. That may be a good idea.
Hi @theRealAph,
Thank you for sharing the example.
My understanding is that this would patch the top frame and it would provide the original PC information for the top deoptimized frame.
Because there might be many deoptimized frames at the same time for the same thread, one slot of this type would be required per compiled frame. Would not this be effectively equivalent to the original PC slot that is currently reserved within each compiled stack frame?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28857#issuecomment-3962360501
More information about the hotspot-dev
mailing list