RFR (S) 8022335 - (round 2) Native stack walk on Windows x64
Ioi Lam
ioi.lam at oracle.com
Tue Sep 3 12:15:30 PDT 2013
Thanks Zhengyu for the review.
On 08/30/2013 05:28 AM, Zhengyu Gu wrote:
> Hi Ioi,
>
> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>
> 493 if (fp == NULL) {
> 494 return frame();
> 495 }
>
I looked at this code again -- it's actually all wrong on Win/x64 for
the reasons I mentioned in my other reply to Volker:
frame os::current_frame() {
#ifdef AMD64
// apparently _asm not supported on windows amd64
typedef intptr_t* get_fp_func ();
get_fp_func* func = CAST_TO_FN_PTR(get_fp_func*,
StubRoutines::x86::get_previous_fp_entry());
if (func == NULL) return frame();
intptr_t* fp = (*func)();
#else
...
The code in get_previous_fp_entry() assumes the we have a frame pointer
register
(which is not always true), and RBP is that register (which is not
always true).
In some cases, especially when the C stack is shallow (e.g., when using
-XX:ErrorHandlerTest=9), RBP can actually be NULL, because is has never
been used by
any of the C functions.
The (fp == NULL) check above was added specifically to avoid a secondary
crash while handling -XX:ErrorHandlerTest=9.
It turns out we also have the same problem in here:
ExtendedPC os::fetch_frame_from_context(void* ucVoid,
intptr_t** ret_sp, intptr_t** ret_fp) {
ExtendedPC epc;
CONTEXT* uc = (CONTEXT*)ucVoid;
if (uc != NULL) {
epc = ExtendedPC((address)uc->REG_PC);
if (ret_sp) *ret_sp = (intptr_t*)uc->REG_SP;
if (ret_fp) *ret_fp = (intptr_t*)uc->REG_FP;
} else {
// construct empty ExtendedPC for return value checking
epc = ExtendedPC(NULL);
if (ret_sp) *ret_sp = (intptr_t *)NULL;
if (ret_fp) *ret_fp = (intptr_t *)NULL;
}
return epc;
}
frame os::fetch_frame_from_context(void* ucVoid) {
intptr_t* sp;
intptr_t* fp;
ExtendedPC epc = fetch_frame_from_context(ucVoid, &sp, &fp);
return frame(sp, fp, epc.pc());
}
There is no guarantee that "fp" would be related to "sp" in any way.
It could be NULL, or could be just garbage.
So the use of os::current_frame() and os::fetch_frame_from_context(_context)
is simply wrong on Win/x64. We haven't got much problem because most of the
use is in debug code:
ps() in debug.cpp
oop::register_oop()
etc
I don't want to fix these in 8022335 <- here all I want is to print a C
stack
trace during a VM crash, and I will be diligent to avoid problems by adding
checks like the (fp == NULL) mentioned above.
I'll file a separate bug for fixing the os::current_frame() and
os::fetch_frame_from_context() issues on Won/x64.
Thanks
- Ioi
> probably you want to move it outside #ifdef AMD64. It is not platform
> specific, right?
>
> Other than that, looks good to me.
>
> FYI:
> You can use "CrashGCForDumpingJavaThread" flag to cause JVM to crash.
>
> Thanks,
>
> -Zhengyu
>
>
>
>
> On 8/29/2013 6:37 PM, Ioi Lam wrote:
>> Please review this fix:
>>
>> http://cr.openjdk.java.net/~iklam/8022335/win64_stack_walk_002/
>>
>> Bug: Native stack walk while generating hs_err does not work on
>> Windows x64
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8022335
>> https://bugs.openjdk.java.net/browse/JDK-8022335
>>
>> What's new:
>>
>> The code is much more simplified than my last version. All
>> interesting
>> code is in a single function -- os::platform_print_native_stack
>> in os_windows_x86.cpp. The rest is just busy work.
>>
>> Summary of fix:
>>
>> Windows x64 binaries are built (unconditionally) with the
>> equivalent of
>> -fomit-frame-pointer, so HotSpot's build-in native stack walking
>> code
>> will fail to find the sender frame. As a result, hs_err on
>> Windows/x64
>> will always list a single frame.
>>
>> I have added the os::platform_print_native_stack() function for
>> Windows/x64 only. It uses the StackWalk64 API to walk the stace.
>>
>> Because the Win/x64 frame layout is very different than what
>> HotSpot expects,
>> I decided to implement os::platform_print_native_stack() as a
>> completely
>> stand-alone function, and do not interact with the existing
>> "frame" C++ class.
>> See comments in os_windows_x86.cpp for details.
>>
>> Deficiency of fix:
>>
>> StackWalk64 knows nothing about the Java frames. So hs_err will
>> display only
>> the native frames, and stop as soon as the first Java frame is
>> reached. It will
>> also NOT display any native frames below Java frames.
>>
>> Printing the Java frames *may* be possible. However, at this
>> point, I want
>> to keep the code simple and crash proof. I will file a different
>> bug for
>> printing the Java frames.
>>
>> Bonus:
>>
>> As a side-fix, I refactored a bunch of duplicated code in
>> decoder.cpp into
>> the DecoderLocker class.
>>
>> Tests:
>>
>> JPRT
>> UTE (vm.runtime.testlist, vm.quick.testlist,
>> vm.parallel_class_loading.testlist)
>>
>> I also manually inserted some crashes into jvm.dll and verified
>> that the
>> native stack trace is printed as expected on Win/x64.
>>
>>
>> Thanks
>> - Ioi
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130903/2f67157e/attachment.html
More information about the hotspot-runtime-dev
mailing list