RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2]
Andrew Haley
aph at openjdk.org
Tue Oct 3 09:36:38 UTC 2023
On Tue, 3 Oct 2023 09:16:46 GMT, Andrew Haley <aph at openjdk.org> wrote:
>>> OK, I think I get it now. The sender_sp may or may not really be the actual SP in a native frame (although with current GCC it is) but we do know that there is a chain of {frame pointer, PC} pairs. If we find a PC that is in a code buffer somewhere in that chain, we can find the size of the corresponding Java stack frame, and by subtraction the "real" SP of that Java frame.
>>>
>> Exactly.
>>
>>> So why is MacOS different?
>>>
>> Clang saves the frame records at the bottom of the frame (highest address), so using fr->sender_sp() works fine there. I can change it to have the same fix as gcc if we don't want to rely on that assumption. The only reason why I went with that simpler fix is that I think knowing that the sender sp is always two words above the current rfp would allow to walk the stack in some cases whereas with the other fix we would crash. Like if a frame passes the os::is_first_C_frame() check but fr->link() is not really the sender's rfp, doing rfp + 2 would still give a valid sender sp, whereas with the other calculation it would set a wrong value. But maybe that's very unlikely. I still added a new test that will fail if the location of the frame records change. What do you think?
>
>> > So why is MacOS different?
>>
>> Clang saves the frame records at the bottom of the frame (highest address), so using fr->sender_sp() works fine there.
>
> Huh, so it does. I never knew that.
>
>> I can change it to have the same fix as gcc if we don't want to rely on that assumption. The only reason why I went with that simpler fix is that I think knowing that the sender sp is always two words above the current rfp would allow to walk the stack in some cases whereas with the other fix we would crash.
>
> I guess so, but clang might change that tomorrow. But OK, from what you say it makes sense.
>
>> Like if a frame passes the os::is_first_C_frame() check but fr->link() is not really the sender's rfp,
>
> We're confident of two things: the frame pointers are a continuous chain through foreign code, and we can unwind frames we create ourselves in the VM. As to where exactly the frame pointer chain is in the stack frame, there are no guarantees: it might be in the middle, and indeed it is in the middle if a function has any local variables that are variable-sized arrays.
>
>> doing rfp + 2 would still give a valid sender sp, whereas with the other calculation it would set a wrong value. But maybe that's very unlikely. I still added a new test that will fail if the location of the frame records change. What do you think?
>
> That's a good idea.
>
> Depending on the internal details of some other open source project is pathological coupling. At the best this is a code smell. Having said that, to fix it properly we'd have to use an unwinder library, and that's a much bigger project. So, OK, I guess this patch is an improvement.
>
> One final thing, though. I'm looking at `jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java` and I see `AARCH64Frame::sender`
>
>
> if (cb != null) {
> return senderForCompiledFrame(map, cb);
> }
>
> // Must be native-compiled frame, i.e. the marshaling code for native
> // methods that exists in the core system.
> return new AARCH64Frame(getSenderSP(), getLink(), getSenderPC());
>
>
> We try to keep the agent code and the HotSpot frame code in step.
NB, I'm not suggesting you should fix AARCH64Frame.java in this, just noting that this looks like the same bug.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1343809003
More information about the hotspot-dev
mailing list