RFR (S) 8022335 - (round 3) Native stack walk on Windows x64
Ioi Lam
ioi.lam at oracle.com
Thu Sep 5 13:25:14 PDT 2013
Hi,
I have updated the patch:
http://cr.openjdk.java.net/~iklam/8022335/win64_stack_walk_003/
The only difference from the last round is in
src/share/vm/utilities/decoder.cpp -- I no longer change the unrelated
functions like Decoder::decode() to use the new DecoderLocker API.
Thanks
- Ioi
On 09/03/2013 12:15 PM, Ioi Lam wrote:
> Thanks Zhengyu for the review.
>
> On 08/30/2013 05:28 AM, Zhengyu Gu wrote:
>> Hi Ioi,
>>
>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>>
>> 493 if (fp == NULL) {
>> 494 return frame();
>> 495 }
>>
> I looked at this code again -- it's actually all wrong on Win/x64 for
> the reasons I mentioned in my other reply to Volker:
>
> frame os::current_frame() {
>
> #ifdef AMD64
> // apparently _asm not supported on windows amd64
> typedef intptr_t* get_fp_func ();
> get_fp_func* func = CAST_TO_FN_PTR(get_fp_func*,
> StubRoutines::x86::get_previous_fp_entry());
> if (func == NULL) return frame();
> intptr_t* fp = (*func)();
> #else
> ...
>
> The code in get_previous_fp_entry() assumes the we have a frame
> pointer register
> (which is not always true), and RBP is that register (which is not
> always true).
> In some cases, especially when the C stack is shallow (e.g., when using
> -XX:ErrorHandlerTest=9), RBP can actually be NULL, because is has
> never been used by
> any of the C functions.
>
> The (fp == NULL) check above was added specifically to avoid a secondary
> crash while handling -XX:ErrorHandlerTest=9.
>
> It turns out we also have the same problem in here:
>
> ExtendedPC os::fetch_frame_from_context(void* ucVoid,
> intptr_t** ret_sp, intptr_t** ret_fp) {
>
> ExtendedPC epc;
> CONTEXT* uc = (CONTEXT*)ucVoid;
>
> if (uc != NULL) {
> epc = ExtendedPC((address)uc->REG_PC);
> if (ret_sp) *ret_sp = (intptr_t*)uc->REG_SP;
> if (ret_fp) *ret_fp = (intptr_t*)uc->REG_FP;
> } else {
> // construct empty ExtendedPC for return value checking
> epc = ExtendedPC(NULL);
> if (ret_sp) *ret_sp = (intptr_t *)NULL;
> if (ret_fp) *ret_fp = (intptr_t *)NULL;
> }
>
> return epc;
> }
>
> frame os::fetch_frame_from_context(void* ucVoid) {
> intptr_t* sp;
> intptr_t* fp;
> ExtendedPC epc = fetch_frame_from_context(ucVoid, &sp, &fp);
> return frame(sp, fp, epc.pc());
> }
>
> There is no guarantee that "fp" would be related to "sp" in any way.
> It could be NULL, or could be just garbage.
>
> So the use of os::current_frame() and
> os::fetch_frame_from_context(_context)
> is simply wrong on Win/x64. We haven't got much problem because most
> of the
> use is in debug code:
>
> ps() in debug.cpp
> oop::register_oop()
> etc
>
> I don't want to fix these in 8022335 <- here all I want is to print a
> C stack
> trace during a VM crash, and I will be diligent to avoid problems by
> adding
> checks like the (fp == NULL) mentioned above.
>
> I'll file a separate bug for fixing the os::current_frame() and
> os::fetch_frame_from_context() issues on Won/x64.
>
> Thanks
> - Ioi
>
>> probably you want to move it outside #ifdef AMD64. It is not platform
>> specific, right?
>>
>> Other than that, looks good to me.
>>
>> FYI:
>> You can use "CrashGCForDumpingJavaThread" flag to cause JVM to crash.
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>
>>
>> On 8/29/2013 6:37 PM, Ioi Lam wrote:
>>> Please review this fix:
>>>
>>> http://cr.openjdk.java.net/~iklam/8022335/win64_stack_walk_002/
>>>
>>> Bug: Native stack walk while generating hs_err does not work on
>>> Windows x64
>>>
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8022335
>>> https://bugs.openjdk.java.net/browse/JDK-8022335
>>>
>>> What's new:
>>>
>>> The code is much more simplified than my last version. All
>>> interesting
>>> code is in a single function -- os::platform_print_native_stack
>>> in os_windows_x86.cpp. The rest is just busy work.
>>>
>>> Summary of fix:
>>>
>>> Windows x64 binaries are built (unconditionally) with the
>>> equivalent of
>>> -fomit-frame-pointer, so HotSpot's build-in native stack walking
>>> code
>>> will fail to find the sender frame. As a result, hs_err on
>>> Windows/x64
>>> will always list a single frame.
>>>
>>> I have added the os::platform_print_native_stack() function for
>>> Windows/x64 only. It uses the StackWalk64 API to walk the stace.
>>>
>>> Because the Win/x64 frame layout is very different than what
>>> HotSpot expects,
>>> I decided to implement os::platform_print_native_stack() as a
>>> completely
>>> stand-alone function, and do not interact with the existing
>>> "frame" C++ class.
>>> See comments in os_windows_x86.cpp for details.
>>>
>>> Deficiency of fix:
>>>
>>> StackWalk64 knows nothing about the Java frames. So hs_err will
>>> display only
>>> the native frames, and stop as soon as the first Java frame is
>>> reached. It will
>>> also NOT display any native frames below Java frames.
>>>
>>> Printing the Java frames *may* be possible. However, at this
>>> point, I want
>>> to keep the code simple and crash proof. I will file a different
>>> bug for
>>> printing the Java frames.
>>>
>>> Bonus:
>>>
>>> As a side-fix, I refactored a bunch of duplicated code in
>>> decoder.cpp into
>>> the DecoderLocker class.
>>>
>>> Tests:
>>>
>>> JPRT
>>> UTE (vm.runtime.testlist, vm.quick.testlist,
>>> vm.parallel_class_loading.testlist)
>>>
>>> I also manually inserted some crashes into jvm.dll and verified
>>> that the
>>> native stack trace is printed as expected on Win/x64.
>>>
>>>
>>> Thanks
>>> - Ioi
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130905/3ec61b38/attachment.html
More information about the hotspot-runtime-dev
mailing list