RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2]

Patricio Chilano Mateo pchilanomate at openjdk.org
Tue Oct 3 19:27:40 UTC 2023


On Tue, 3 Oct 2023 09:34:12 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> > So why is MacOS different?
>>> 
>>> Clang saves the frame records at the bottom of the frame (highest address), so using fr->sender_sp() works fine there.
>> 
>> Huh, so it does. I never knew that.
>> 
>>> I can change it to have the same fix as gcc if we don't want to rely on that assumption. The only reason why I went with that simpler fix is that I think knowing that the sender sp is always two words above the current rfp would allow to walk the stack in some cases whereas with the other fix we would crash.
>> 
>> I guess so, but clang might change that tomorrow. But OK, from what you say it makes sense.
>> 
>>> Like if a frame passes the os::is_first_C_frame() check but fr->link() is not really the sender's rfp,
>> 
>> We're confident of two things: the frame pointers are a continuous chain through foreign code, and we can unwind frames we create ourselves in the VM. As to where exactly the frame pointer chain is in the stack frame, there are no guarantees: it might be in the middle, and indeed it is in the middle if a function has any local variables that are variable-sized arrays.
>> 
>>> doing rfp + 2 would still give a valid sender sp, whereas with the other calculation it would set a wrong value. But maybe that's very unlikely. I still added a new test that will fail if the location of the frame records change. What do you think?
>> 
>> That's a good idea.
>> 
>> Depending on the internal details of some other open source project is pathological coupling. At the best this is a code smell. Having said that, to fix it properly we'd have to use an unwinder library, and that's a much bigger project. So, OK, I guess this patch is an improvement.
>> 
>> One final thing, though. I'm looking at `jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java` and I see `AARCH64Frame::sender` 
>> 
>> 
>>     if (cb != null) {
>>       return senderForCompiledFrame(map, cb);
>>     }
>> 
>>     // Must be native-compiled frame, i.e. the marshaling code for native
>>     // methods that exists in the core system.
>>     return new AARCH64Frame(getSenderSP(), getLink(), getSenderPC());
>> 
>> 
>> We try to keep the agent code and the HotSpot frame code in step.
>
> NB, I'm not suggesting you should fix AARCH64Frame.java in this, just noting that this looks like the same bug.

Yes, it's also the same code we have in frame::sender_raw(). I'm not sure how we can get to that native frame case when we use frame::sender() though. So if we start from the last Java frame we shouldn't find a native frame in the chain. And when we start from os::current_frame() we use VMError::print_native_stack() which uses os::get_sender_for_C_frame() for native frames. The only case I see is the debug utility ps(), which in case there is no last Java frame, creates a frame with os::current_frame() and then uses frame::sender(). So seems either we should change that last line in sender_raw() to be os::get_sender_for_C_frame() or replace it with an assert(false, "") and fix ps().

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1344619813


More information about the hotspot-dev mailing list