RFR (S) 8022335 - (round 3) Native stack walk on Windows x64

Ioi Lam ioi.lam at oracle.com
Thu Sep 5 13:25:14 PDT 2013


Hi,

I have updated the patch:

http://cr.openjdk.java.net/~iklam/8022335/win64_stack_walk_003/

The only difference from the last round is in 
src/share/vm/utilities/decoder.cpp -- I no longer change the unrelated 
functions like Decoder::decode() to use the new DecoderLocker API.

Thanks
- Ioi

On 09/03/2013 12:15 PM, Ioi Lam wrote:
> Thanks Zhengyu for the review.
>
> On 08/30/2013 05:28 AM, Zhengyu Gu wrote:
>> Hi Ioi,
>>
>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>>
>> 493   if (fp == NULL) {
>> 494     return frame();
>> 495   }
>>
> I looked at this code again -- it's actually all wrong on Win/x64 for
> the reasons I mentioned in my other reply to Volker:
>
>     frame os::current_frame() {
>
>     #ifdef AMD64
>       // apparently _asm not supported on windows amd64
>       typedef intptr_t*      get_fp_func           ();
>       get_fp_func* func = CAST_TO_FN_PTR(get_fp_func*,
> StubRoutines::x86::get_previous_fp_entry());
>       if (func == NULL) return frame();
>       intptr_t* fp = (*func)();
>     #else
>     ...
>
> The code in get_previous_fp_entry() assumes the we have a frame 
> pointer register
> (which is not always true), and RBP is that register (which is not 
> always true).
> In some cases, especially when the C stack is shallow (e.g., when using
> -XX:ErrorHandlerTest=9), RBP can actually be NULL, because is has 
> never been used by
> any of the C functions.
>
> The (fp == NULL) check above was added specifically to avoid a secondary
> crash while handling -XX:ErrorHandlerTest=9.
>
> It turns out we also have the same problem in here:
>
>     ExtendedPC os::fetch_frame_from_context(void* ucVoid,
>                         intptr_t** ret_sp, intptr_t** ret_fp) {
>
>       ExtendedPC  epc;
>       CONTEXT* uc = (CONTEXT*)ucVoid;
>
>       if (uc != NULL) {
>         epc = ExtendedPC((address)uc->REG_PC);
>         if (ret_sp) *ret_sp = (intptr_t*)uc->REG_SP;
>         if (ret_fp) *ret_fp = (intptr_t*)uc->REG_FP;
>       } else {
>         // construct empty ExtendedPC for return value checking
>         epc = ExtendedPC(NULL);
>         if (ret_sp) *ret_sp = (intptr_t *)NULL;
>         if (ret_fp) *ret_fp = (intptr_t *)NULL;
>       }
>
>       return epc;
>     }
>
>     frame os::fetch_frame_from_context(void* ucVoid) {
>       intptr_t* sp;
>       intptr_t* fp;
>       ExtendedPC epc = fetch_frame_from_context(ucVoid, &sp, &fp);
>       return frame(sp, fp, epc.pc());
>     }
>
> There is no guarantee that "fp" would be related to "sp" in any way.
> It could be NULL, or could be just garbage.
>
> So the use of os::current_frame() and 
> os::fetch_frame_from_context(_context)
> is simply wrong on Win/x64. We haven't got much problem because most 
> of the
> use is in debug code:
>
>     ps() in debug.cpp
>     oop::register_oop()
>     etc
>
> I don't want to fix these in 8022335 <- here all I want is to print a 
> C stack
> trace during a VM crash, and I will be diligent to avoid problems by 
> adding
> checks like the (fp == NULL) mentioned above.
>
> I'll file a separate bug for fixing the os::current_frame() and
> os::fetch_frame_from_context() issues on Won/x64.
>
> Thanks
> - Ioi
>
>> probably you want to move it outside #ifdef AMD64. It is not platform 
>> specific, right?
>>
>> Other than that, looks good to me.
>>
>> FYI:
>>   You can use "CrashGCForDumpingJavaThread" flag to cause JVM to crash.
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>
>>
>> On 8/29/2013 6:37 PM, Ioi Lam wrote:
>>> Please review this fix:
>>>
>>> http://cr.openjdk.java.net/~iklam/8022335/win64_stack_walk_002/
>>>
>>> Bug: Native stack walk while generating hs_err does not work on 
>>> Windows x64
>>>
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8022335
>>> https://bugs.openjdk.java.net/browse/JDK-8022335
>>>
>>> What's new:
>>>
>>>     The code is much more simplified than my last version. All 
>>> interesting
>>>     code is in a single function -- os::platform_print_native_stack
>>>     in os_windows_x86.cpp. The rest is just busy work.
>>>
>>> Summary of fix:
>>>
>>>     Windows x64 binaries are built (unconditionally) with the 
>>> equivalent of
>>>     -fomit-frame-pointer, so HotSpot's build-in native stack walking 
>>> code
>>>     will fail to find the sender frame. As a result, hs_err on 
>>> Windows/x64
>>>     will always list a single frame.
>>>
>>>     I have added the os::platform_print_native_stack() function for
>>>     Windows/x64 only. It uses the StackWalk64 API to walk the stace.
>>>
>>>     Because the Win/x64 frame layout is very different than what 
>>> HotSpot expects,
>>>     I decided to implement os::platform_print_native_stack() as a 
>>> completely
>>>     stand-alone function, and do not interact with the existing 
>>> "frame" C++ class.
>>>     See comments in os_windows_x86.cpp for details.
>>>
>>> Deficiency of fix:
>>>
>>>     StackWalk64 knows nothing about the Java frames. So hs_err will 
>>> display only
>>>     the native frames, and stop as soon as the first Java frame is 
>>> reached. It will
>>>     also NOT display any native frames below Java frames.
>>>
>>>     Printing the Java frames *may* be possible. However, at this 
>>> point, I want
>>>     to keep the code simple and crash proof. I will file a different 
>>> bug for
>>>     printing the Java frames.
>>>
>>> Bonus:
>>>
>>>     As a side-fix, I refactored a bunch of duplicated code in 
>>> decoder.cpp into
>>>     the DecoderLocker class.
>>>
>>> Tests:
>>>
>>>     JPRT
>>>     UTE (vm.runtime.testlist, vm.quick.testlist, 
>>> vm.parallel_class_loading.testlist)
>>>
>>>     I also manually inserted some crashes into jvm.dll and verified 
>>> that the
>>>     native stack trace is printed as expected on Win/x64.
>>>
>>>
>>> Thanks
>>> - Ioi
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130905/3ec61b38/attachment.html 


More information about the hotspot-runtime-dev mailing list