RFR (S) 8022335 - (round 2) Native stack walk on Windows x64

Ioi Lam ioi.lam at oracle.com
Tue Sep 3 12:15:30 PDT 2013


Thanks Zhengyu for the review.

On 08/30/2013 05:28 AM, Zhengyu Gu wrote:
> Hi Ioi,
>
> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>
> 493   if (fp == NULL) {
> 494     return frame();
> 495   }
>
I looked at this code again -- it's actually all wrong on Win/x64 for
the reasons I mentioned in my other reply to Volker:

     frame os::current_frame() {

     #ifdef AMD64
       // apparently _asm not supported on windows amd64
       typedef intptr_t*      get_fp_func           ();
       get_fp_func* func = CAST_TO_FN_PTR(get_fp_func*,
StubRoutines::x86::get_previous_fp_entry());
       if (func == NULL) return frame();
       intptr_t* fp = (*func)();
     #else
     ...

The code in get_previous_fp_entry() assumes the we have a frame pointer 
register
(which is not always true), and RBP is that register (which is not 
always true).
In some cases, especially when the C stack is shallow (e.g., when using
-XX:ErrorHandlerTest=9), RBP can actually be NULL, because is has never 
been used by
any of the C functions.

The (fp == NULL) check above was added specifically to avoid a secondary
crash while handling -XX:ErrorHandlerTest=9.

It turns out we also have the same problem in here:

    ExtendedPC os::fetch_frame_from_context(void* ucVoid,
                         intptr_t** ret_sp, intptr_t** ret_fp) {

       ExtendedPC  epc;
       CONTEXT* uc = (CONTEXT*)ucVoid;

       if (uc != NULL) {
         epc = ExtendedPC((address)uc->REG_PC);
         if (ret_sp) *ret_sp = (intptr_t*)uc->REG_SP;
         if (ret_fp) *ret_fp = (intptr_t*)uc->REG_FP;
       } else {
         // construct empty ExtendedPC for return value checking
         epc = ExtendedPC(NULL);
         if (ret_sp) *ret_sp = (intptr_t *)NULL;
         if (ret_fp) *ret_fp = (intptr_t *)NULL;
       }

       return epc;
    }

    frame os::fetch_frame_from_context(void* ucVoid) {
       intptr_t* sp;
       intptr_t* fp;
       ExtendedPC epc = fetch_frame_from_context(ucVoid, &sp, &fp);
       return frame(sp, fp, epc.pc());
    }

There is no guarantee that "fp" would be related to "sp" in any way.
It could be NULL, or could be just garbage.

So the use of os::current_frame() and os::fetch_frame_from_context(_context)
is simply wrong on Win/x64. We haven't got much problem because most of the
use is in debug code:

     ps() in debug.cpp
     oop::register_oop()
     etc

I don't want to fix these in 8022335 <- here all I want is to print a C 
stack
trace during a VM crash, and I will be diligent to avoid problems by adding
checks like the (fp == NULL) mentioned above.

I'll file a separate bug for fixing the os::current_frame() and
os::fetch_frame_from_context() issues on Won/x64.

Thanks
- Ioi

> probably you want to move it outside #ifdef AMD64. It is not platform 
> specific, right?
>
> Other than that, looks good to me.
>
> FYI:
>   You can use "CrashGCForDumpingJavaThread" flag to cause JVM to crash.
>
> Thanks,
>
> -Zhengyu
>
>
>
>
> On 8/29/2013 6:37 PM, Ioi Lam wrote:
>> Please review this fix:
>>
>> http://cr.openjdk.java.net/~iklam/8022335/win64_stack_walk_002/
>>
>> Bug: Native stack walk while generating hs_err does not work on 
>> Windows x64
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8022335
>> https://bugs.openjdk.java.net/browse/JDK-8022335
>>
>> What's new:
>>
>>     The code is much more simplified than my last version. All 
>> interesting
>>     code is in a single function -- os::platform_print_native_stack
>>     in os_windows_x86.cpp. The rest is just busy work.
>>
>> Summary of fix:
>>
>>     Windows x64 binaries are built (unconditionally) with the 
>> equivalent of
>>     -fomit-frame-pointer, so HotSpot's build-in native stack walking 
>> code
>>     will fail to find the sender frame. As a result, hs_err on 
>> Windows/x64
>>     will always list a single frame.
>>
>>     I have added the os::platform_print_native_stack() function for
>>     Windows/x64 only. It uses the StackWalk64 API to walk the stace.
>>
>>     Because the Win/x64 frame layout is very different than what 
>> HotSpot expects,
>>     I decided to implement os::platform_print_native_stack() as a 
>> completely
>>     stand-alone function, and do not interact with the existing 
>> "frame" C++ class.
>>     See comments in os_windows_x86.cpp for details.
>>
>> Deficiency of fix:
>>
>>     StackWalk64 knows nothing about the Java frames. So hs_err will 
>> display only
>>     the native frames, and stop as soon as the first Java frame is 
>> reached. It will
>>     also NOT display any native frames below Java frames.
>>
>>     Printing the Java frames *may* be possible. However, at this 
>> point, I want
>>     to keep the code simple and crash proof. I will file a different 
>> bug for
>>     printing the Java frames.
>>
>> Bonus:
>>
>>     As a side-fix, I refactored a bunch of duplicated code in 
>> decoder.cpp into
>>     the DecoderLocker class.
>>
>> Tests:
>>
>>     JPRT
>>     UTE (vm.runtime.testlist, vm.quick.testlist, 
>> vm.parallel_class_loading.testlist)
>>
>>     I also manually inserted some crashes into jvm.dll and verified 
>> that the
>>     native stack trace is printed as expected on Win/x64.
>>
>>
>> Thanks
>> - Ioi
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130903/2f67157e/attachment.html 


More information about the hotspot-runtime-dev mailing list