RFR (S) 8022335 - (round 3) Native stack walk on Windows x64

David Holmes david.holmes at oracle.com
Thu Sep 5 21:59:05 PDT 2013


I can't say I understand the lowest details here but the overall 
approach seem sound. I just hope we don't lose more information than we 
gain, even if presently we are just getting lucky with the Java frames.

Reviewed.

Thanks,
David

On 6/09/2013 6:25 AM, Ioi Lam wrote:
> Hi,
>
> I have updated the patch:
>
> http://cr.openjdk.java.net/~iklam/8022335/win64_stack_walk_003/
>
> The only difference from the last round is in
> src/share/vm/utilities/decoder.cpp -- I no longer change the unrelated
> functions like Decoder::decode() to use the new DecoderLocker API.
>
> Thanks
> - Ioi
>
> On 09/03/2013 12:15 PM, Ioi Lam wrote:
>> Thanks Zhengyu for the review.
>>
>> On 08/30/2013 05:28 AM, Zhengyu Gu wrote:
>>> Hi Ioi,
>>>
>>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>>>
>>> 493   if (fp == NULL) {
>>> 494     return frame();
>>> 495   }
>>>
>> I looked at this code again -- it's actually all wrong on Win/x64 for
>> the reasons I mentioned in my other reply to Volker:
>>
>>     frame os::current_frame() {
>>
>>     #ifdef AMD64
>>       // apparently _asm not supported on windows amd64
>>       typedef intptr_t*      get_fp_func           ();
>>       get_fp_func* func = CAST_TO_FN_PTR(get_fp_func*,
>> StubRoutines::x86::get_previous_fp_entry());
>>       if (func == NULL) return frame();
>>       intptr_t* fp = (*func)();
>>     #else
>>     ...
>>
>> The code in get_previous_fp_entry() assumes the we have a frame
>> pointer register
>> (which is not always true), and RBP is that register (which is not
>> always true).
>> In some cases, especially when the C stack is shallow (e.g., when using
>> -XX:ErrorHandlerTest=9), RBP can actually be NULL, because is has
>> never been used by
>> any of the C functions.
>>
>> The (fp == NULL) check above was added specifically to avoid a secondary
>> crash while handling -XX:ErrorHandlerTest=9.
>>
>> It turns out we also have the same problem in here:
>>
>>     ExtendedPC os::fetch_frame_from_context(void* ucVoid,
>>                         intptr_t** ret_sp, intptr_t** ret_fp) {
>>
>>       ExtendedPC  epc;
>>       CONTEXT* uc = (CONTEXT*)ucVoid;
>>
>>       if (uc != NULL) {
>>         epc = ExtendedPC((address)uc->REG_PC);
>>         if (ret_sp) *ret_sp = (intptr_t*)uc->REG_SP;
>>         if (ret_fp) *ret_fp = (intptr_t*)uc->REG_FP;
>>       } else {
>>         // construct empty ExtendedPC for return value checking
>>         epc = ExtendedPC(NULL);
>>         if (ret_sp) *ret_sp = (intptr_t *)NULL;
>>         if (ret_fp) *ret_fp = (intptr_t *)NULL;
>>       }
>>
>>       return epc;
>>     }
>>
>>     frame os::fetch_frame_from_context(void* ucVoid) {
>>       intptr_t* sp;
>>       intptr_t* fp;
>>       ExtendedPC epc = fetch_frame_from_context(ucVoid, &sp, &fp);
>>       return frame(sp, fp, epc.pc());
>>     }
>>
>> There is no guarantee that "fp" would be related to "sp" in any way.
>> It could be NULL, or could be just garbage.
>>
>> So the use of os::current_frame() and
>> os::fetch_frame_from_context(_context)
>> is simply wrong on Win/x64. We haven't got much problem because most
>> of the
>> use is in debug code:
>>
>>     ps() in debug.cpp
>>     oop::register_oop()
>>     etc
>>
>> I don't want to fix these in 8022335 <- here all I want is to print a
>> C stack
>> trace during a VM crash, and I will be diligent to avoid problems by
>> adding
>> checks like the (fp == NULL) mentioned above.
>>
>> I'll file a separate bug for fixing the os::current_frame() and
>> os::fetch_frame_from_context() issues on Won/x64.
>>
>> Thanks
>> - Ioi
>>
>>> probably you want to move it outside #ifdef AMD64. It is not platform
>>> specific, right?
>>>
>>> Other than that, looks good to me.
>>>
>>> FYI:
>>>   You can use "CrashGCForDumpingJavaThread" flag to cause JVM to crash.
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>>
>>>
>>> On 8/29/2013 6:37 PM, Ioi Lam wrote:
>>>> Please review this fix:
>>>>
>>>> http://cr.openjdk.java.net/~iklam/8022335/win64_stack_walk_002/
>>>>
>>>> Bug: Native stack walk while generating hs_err does not work on
>>>> Windows x64
>>>>
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8022335
>>>> https://bugs.openjdk.java.net/browse/JDK-8022335
>>>>
>>>> What's new:
>>>>
>>>>     The code is much more simplified than my last version. All
>>>> interesting
>>>>     code is in a single function -- os::platform_print_native_stack
>>>>     in os_windows_x86.cpp. The rest is just busy work.
>>>>
>>>> Summary of fix:
>>>>
>>>>     Windows x64 binaries are built (unconditionally) with the
>>>> equivalent of
>>>>     -fomit-frame-pointer, so HotSpot's build-in native stack walking
>>>> code
>>>>     will fail to find the sender frame. As a result, hs_err on
>>>> Windows/x64
>>>>     will always list a single frame.
>>>>
>>>>     I have added the os::platform_print_native_stack() function for
>>>>     Windows/x64 only. It uses the StackWalk64 API to walk the stace.
>>>>
>>>>     Because the Win/x64 frame layout is very different than what
>>>> HotSpot expects,
>>>>     I decided to implement os::platform_print_native_stack() as a
>>>> completely
>>>>     stand-alone function, and do not interact with the existing
>>>> "frame" C++ class.
>>>>     See comments in os_windows_x86.cpp for details.
>>>>
>>>> Deficiency of fix:
>>>>
>>>>     StackWalk64 knows nothing about the Java frames. So hs_err will
>>>> display only
>>>>     the native frames, and stop as soon as the first Java frame is
>>>> reached. It will
>>>>     also NOT display any native frames below Java frames.
>>>>
>>>>     Printing the Java frames *may* be possible. However, at this
>>>> point, I want
>>>>     to keep the code simple and crash proof. I will file a different
>>>> bug for
>>>>     printing the Java frames.
>>>>
>>>> Bonus:
>>>>
>>>>     As a side-fix, I refactored a bunch of duplicated code in
>>>> decoder.cpp into
>>>>     the DecoderLocker class.
>>>>
>>>> Tests:
>>>>
>>>>     JPRT
>>>>     UTE (vm.runtime.testlist, vm.quick.testlist,
>>>> vm.parallel_class_loading.testlist)
>>>>
>>>>     I also manually inserted some crashes into jvm.dll and verified
>>>> that the
>>>>     native stack trace is printed as expected on Win/x64.
>>>>
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>
>>
>


More information about the hotspot-runtime-dev mailing list