RFR: 8277948: AArch64: Print the correct native stack if -XX:+PreserveFramePointer when crash [v4]
Andrew Dinn
adinn at openjdk.java.net
Thu Jan 27 12:25:29 UTC 2022
On Sat, 22 Jan 2022 02:40:41 GMT, Denghui Dong <ddong at openjdk.org> wrote:
>> Hi,
>>
>> I found that the native stack frames in the hs log are not accurate sometimes on AArch64, not sure if this is a known issue or an issue worth fixing.
>>
>> The following steps can quick reproduce the problem:
>>
>> 1. apply the diff(comment the dtrace_object_alloc call in interpreter and make a crash on SharedRuntime::dtrace_object_alloc)
>>
>> index 39e99bdd5ed..4fc768e94aa 100644
>> --- a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
>> +++ b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
>> @@ -3558,6 +3558,7 @@ void TemplateTable::_new() {
>> __ store_klass_gap(r0, zr); // zero klass gap for compressed oops
>> __ store_klass(r0, r4); // store klass last
>>
>> +/**
>> {
>> SkipIfEqual skip(_masm, &DTraceAllocProbes, false);
>> // Trigger dtrace event for fastpath
>> @@ -3567,6 +3568,7 @@ void TemplateTable::_new() {
>> __ pop(atos); // restore the return value
>>
>> }
>> +*/
>> __ b(done);
>> }
>>
>> diff --git a/src/hotspot/cpu/x86/templateTable_x86.cpp b/src/hotspot/cpu/x86/templateTable_x86.cpp
>> index 19530b7c57c..15b0509da4c 100644
>> --- a/src/hotspot/cpu/x86/templateTable_x86.cpp
>> +++ b/src/hotspot/cpu/x86/templateTable_x86.cpp
>> @@ -4033,6 +4033,7 @@ void TemplateTable::_new() {
>> Register tmp_store_klass = LP64_ONLY(rscratch1) NOT_LP64(noreg);
>> __ store_klass(rax, rcx, tmp_store_klass); // klass
>>
>> +/**
>> {
>> SkipIfEqual skip_if(_masm, &DTraceAllocProbes, 0);
>> // Trigger dtrace event for fastpath
>> @@ -4041,6 +4042,7 @@ void TemplateTable::_new() {
>> CAST_FROM_FN_PTR(address, static_cast<int (*)(oopDesc*)>(SharedRuntime::dtrace_object_alloc)), rax);
>> __ pop(atos);
>> }
>> +*/
>>
>> __ jmp(done);
>> }
>> diff --git a/src/hotspot/share/runtime/sharedRuntime.cpp b/src/hotspot/share/runtime/sharedRuntime.cpp
>> index a5de65ea5ab..60b4bd3bcc8 100644
>> --- a/src/hotspot/share/runtime/sharedRuntime.cpp
>> +++ b/src/hotspot/share/runtime/sharedRuntime.cpp
>> @@ -1002,6 +1002,7 @@ jlong SharedRuntime::get_java_tid(Thread* thread) {
>> * 6254741. Once that is fixed we can remove the dummy return value.
>> */
>> int SharedRuntime::dtrace_object_alloc(oopDesc* o) {
>> + *(int*)0 = 1;
>> return dtrace_object_alloc(Thread::current(), o, o->size());
>> }
>>
>>
>> 2. `java -XX:+DTraceAllocProbes -Xcomp -XX:-PreserveFramePointer -version`
>>
>> On x86_64, the native stack in hs log is complete, but on AArch64, the native stack is incorrect.
>>
>> In the beginning, I thought it might be the influence of PreserveFramePointer. Later, I found that no matter whether PreserveFramePointer is enabled or not, in the hs log of x86_64, the native stack is always correct, and aarch64 is wrong.
>>
>> After some investigation, I found that this problem is related to the layout of the stack.
>>
>> On x86_64, whether it is C/C++, interpreter, or JIT, `callee` will always put the `return address` and `fp` of the `caller` at the bottom of the stack.
>> Hence, `callee` can always get the `caller sp`(aka `sender sp`) by `fp + 2`, and if `caller` is a compiled method, `caller sp` is the key to getting the `caller`'s `caller` since `caller fp` may be invalid.(see frame::sender_for_compiled_frame).
>>
>>
>> push %rbp
>> mov %rsp,%rbp
>>
>> _ _ _ _ _ _
>> | |
>> | | |
>> |_ _ _ _ _ _| |
>> | | |
>> caller | | <- caller sp |
>> _ _ _ |_ _ _ _ _ _| | expand
>> | | |
>> | ret addr | | direction
>> callee |_ _ _ _ _ _| |
>> | | V
>> | caller fp | <- fp
>> |_ _ _ _ _ _|
>>
>>
>>
>> But for AArch64, the C/C++ code doesn't put the `return address` and `fp` of the `caller` at the bottom of the stack.
>> Hence, we cannot use `fp + 2` to calculate the proper `caller sp`(although it is still implemented this way).
>>
>> When `caller` is a C1/C2 method A, and `callee` is a C/C++ method B, we cannot get the `caller` of A since we cannot get the proper sp value of it.
>>
>>
>> stp x29, x30, [sp, #-N]!
>> mov x29, sp
>>
>> _ _ _ _ _ _
>> | |
>> | | |
>> |_ _ _ _ _ _| |
>> | | |
>> caller | | <- caller sp |
>> _ _ _ |_ _ _ _ _ _| - | expand
>> | |
>> . . . . . | | direction
>> _ _ _ _ _ _ | |
>> | | | N |
>> | ret addr | | |
>> callee |_ _ _ _ _ _| | |
>> | | - V
>> | caller fp | <- fp
>> |_ _ _ _ _ _|
>>
>>
>>
>> I am not very familiar with AArch64 and have no idea how to fix this issue perfectly at current.
>>
>> Based on my understanding of the implementation, we can get the correct stack trace when PreserveFramePointer is enabled.
>>
>> Although PreserveFramePointer is disabled by default, I found that some real applications will enable it in the production environment.
>> Therefore, in my opinion, this fix can help troubleshoot crash issues in applications that enable PreserveFramePointer on AArch64 platform.
>>
>> This patch changes the logic of l_sender_sp calculation, uses sender_sp() as the value of l_sender_sp when PreserveFramePointer is enabled.
>>
>> Any input is appreciated.
>>
>> Thanks,
>> Denghui
>
> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision:
>
> update copyright year
This looks ok apart from the comment in sender_for_compiled_frame needing updating.
src/hotspot/cpu/aarch64/frame_aarch64.cpp line 470:
> 468: // in C2 code but it will have been pushed onto the stack. so we
> 469: // have to find it relative to the unextended sp
> 470:
The comment above this change needs to be updated to explain when and why it is correct to use 1) the unextended sp and frame size or 2) the sender sp.
-------------
Changes requested by adinn (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/6597
More information about the hotspot-dev
mailing list