RFR (S) 8022335 - (round 2) Native stack walk on Windows x64

Ioi Lam ioi.lam at oracle.com
Fri Aug 30 18:15:21 PDT 2013


On 08/30/2013 07:32 AM, Volker Simonis wrote:
>
>
> On Fri, Aug 30, 2013 at 12:37 AM, Ioi Lam <ioi.lam at oracle.com 
> <mailto:ioi.lam at oracle.com>> wrote:
> > Please review this fix:
> >
> > http://cr.openjdk.java.net/~iklam/8022335/win64_stack_walk_002/ 
> <http://cr.openjdk.java.net/%7Eiklam/8022335/win64_stack_walk_002/>
> >
> > Bug: Native stack walk while generating hs_err does not work on 
> Windows x64
> >
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8022335
> > https://bugs.openjdk.java.net/browse/JDK-8022335
> >
> > What's new:
> >
> >     The code is much more simplified than my last version. All 
> interesting
> >     code is in a single function -- os::platform_print_native_stack
> >     in os_windows_x86.cpp. The rest is just busy work.
> >
> > Summary of fix:
> >
> >     Windows x64 binaries are built (unconditionally) with the 
> equivalent of
> >     -fomit-frame-pointer,
>
> Why do you think so? If I'm looking at the build logs and 
> make/windows/makefiles/compile.make I see that we always compile with 
> '/O2 /Oy-' which according to 
> http://msdn.microsoft.com/en-us/library/2kxx5t2c%28v=vs.100%29.aspx 
> means 'disable frame-pointer omission'. So according to that 
> documentation, we should have frame pointers in all functions.
>
> Actually '/Oy-' was introduced in order to get better native stack 
> traces with http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6655385
> Doesn’t it help or are there any other additional problem?
>

/Oy is for x86 only. It has no effect on x64. As mentioned in the MSDN 
link above

     "/Oy enables frame-pointer omission and /Oy- disables omission. /Oy 
is available only in x86 compilers."

^ This was also discussed in the 
https://bugs.openjdk.java.net/browse/JDK-8022335page. Unfortunately it's 
not (yet?) available to the public :-(

More info about the x64 stack can be found here
http://msdn.microsoft.com/en-us/library/ew5tede7.aspx
http://www.codejury.com/a-walk-in-x64-land/

Essentially

    Fixed frames are always addressed via RSP only.

    Dynamic frames (when alloca() is used), a frame pointer register is 
used.
    But this register is not necessary RBP.

> Currently I get the following in the hs_err file on Windows for a Java 
> program which crashes in Unsafe:
>
> # JRE version: OpenJDK Runtime Environment (8.0) (build 
> 1.8.0-internal-fastdebug-_2013_08_29_21_12-b00)
> # Java VM: OpenJDK 64-Bit Server VM (25.0-b45-fastdebug mixed mode 
> windows-amd64 compressed oops)
> # Problematic frame:
> # V  [jvm.dll+0x28c7a7]
> ...
> *Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
> C=native code)
> V  [jvm.dll+0x28c7a7]
> J  Crash.crashIt(Lsun/misc/Unsafe;I)V @ 0x0000000008e9b19c 
> [0x0000000008e9b140+92]
> v  ~StubRoutines::call_stub
> V  [jvm.dll+0x2c961a]
> *
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  sun.misc.Unsafe.putAddress(JJ)V+0
> J  Crash.crashIt(Lsun/misc/Unsafe;I)V @ 0x0000000008e9b19c 
> [0x0000000008e9b140+92]
> j  Crash.doIt()V+45
> v  ~StubRoutines::call_stub
> j 
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
> j 
>  sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+87
> j 
>  sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6
> j 
>  java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
> j  Crash.main([Ljava/lang/String;)V+32
> v  ~StubRoutines::call_stub
>
>
> On Linux, the same crashing program produces the following output in 
> the hs_err file:
>
> # JRE version: OpenJDK Runtime Environment (8.0) (build 
> 1.8.0-internal-jvmtests_2013_08_29_20_14-b00)
> # Java VM: OpenJDK 64-Bit Server VM (25.0-b45 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x9b4ba7]  Unsafe_SetNativeAddress+0xa7
> ...
> *Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
> C=native code)
> V  [libjvm.so+0x9b4ba7]  Unsafe_SetNativeAddress+0xa7
> j  sun.misc.Unsafe.putAddress(JJ)V+0
> J  Crash.crashIt(Lsun/misc/Unsafe;I)V @ 0x00007f0b4d0f2c9c 
> [0x00007f0b4d0f2b80+284]
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x6174ee]  JavaCalls::call_helper(JavaValue*, 
> methodHandle*, JavaCallArguments*, Thread*)+0x104e
> V  [libjvm.so+0x8c2e12]  Reflection::invoke(instanceKlassHandle, 
> methodHandle, Handle, bool, objArrayHandle, BasicType, objArrayHandle, 
> bool, Thread*)+0x5e2
> V  [libjvm.so+0x8c63e7]  Reflection::invoke_method(oopDesc*, Handle, 
> objArrayHandle, Thread*)+0x147
> V  [libjvm.so+0x66dd8e]  JVM_InvokeMethod+0x25e
> j 
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
> j 
>  sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+87
> j 
>  sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6
> j 
>  java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
> j  Crash.main([Ljava/lang/String;)V+32
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x6174ee]  JavaCalls::call_helper(JavaValue*, 
> methodHandle*, JavaCallArguments*, Thread*)+0x104e
> V  [libjvm.so+0x631226]  jni_invoke_static(JNIEnv_*, JavaValue*, 
> _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x346
> V  [libjvm.so+0x63b98a]  jni_CallStaticVoidMethod+0x17a
> C  [libjli.so+0x75ed]  JavaMain+0x83d*
>
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  sun.misc.Unsafe.putAddress(JJ)V+0
> J  Crash.crashIt(Lsun/misc/Unsafe;I)V @ 0x00007f0b4d0f2c9c 
> [0x00007f0b4d0f2b80+284]
> j  Crash.doIt()V+45
> v  ~StubRoutines::call_stub
> j 
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
> j 
>  sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+87
> j 
>  sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6
> j 
>  java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
> j  Crash.main([Ljava/lang/String;)V+32
> v  ~StubRoutines::call_stub
>
> So while the native (i.e. "mixed") stack trace is still much more 
> accurate on Linux, it isn't just a single line as you wrote. Maybe we 
> could work on improving this situation on Windows without a separate 
> Windows stack tracing routine or are there any fundamental problems 
> which prevent this?
>

Thank you for the test case. It makes my testing a lot easier!

Unfortunately you can walk the stack in this case only because you're 
lucky. I have disassembled Unsafe_SetNativeAddress, up to the point of 
the crash:

    UNSAFE_ENTRY(void, Unsafe_SetNativeAddress(JNIEnv *env, jobject
    unsafe, jlong addr, jlong x))
    000000006CD106D0  mov         qword ptr [rsp+8],rbx
    000000006CD106D5  mov         qword ptr [rsp+10h],rsi
    000000006CD106DA  push        rdi
    000000006CD106DB  sub         rsp,20h
    000000006CD106DF  mov         eax,dword ptr [rcx+90h]
    000000006CD106E5  lea         rbx,[rcx-1E0h]
    000000006CD106EC  mov         rdi,r9
    000000006CD106EF  mov         rsi,r8
    000000006CD106F2  cmp         eax,0DEABh
    000000006CD106F7  je          Unsafe_SetNativeAddress+40h (06CD10710h)
    000000006CD106F9  mov         eax,dword ptr [rbx+270h]
    000000006CD106FF  cmp         eax,0DEACh
    000000006CD10704  je          Unsafe_SetNativeAddress+40h (06CD10710h)
    000000006CD10706  mov         rcx,rbx
    000000006CD10709  call        JavaThread::block_if_vm_exited
    (06CD53AF0h)
    000000006CD1070E  xor         ebx,ebx
    000000006CD10710  mov         dword ptr [rbx+258h],5
    000000006CD1071A  cmp         dword ptr [os::_processor_count
    (06CEB287Ch)],1
    000000006CD10721  jg          Unsafe_SetNativeAddress+5Ch (06CD1072Ch)
    000000006CD10723  cmp         byte ptr [AssumeMP (06CEB1FDCh)],0
    000000006CD1072A  je          Unsafe_SetNativeAddress+74h (06CD10744h)
    000000006CD1072C  cmp         byte ptr [UseMembar (06CEB1FDDh)],0
    000000006CD10733  je          Unsafe_SetNativeAddress+6Ch (06CD1073Ch)
    000000006CD10735  call        OrderAccess::StubRoutines_fence
    (06CD36BC0h)
    000000006CD1073A  jmp         Unsafe_SetNativeAddress+74h (06CD10744h)
    000000006CD1073C  mov         rcx,rbx
    000000006CD1073F  call InterfaceSupport::serialize_memory (06CA59A10h)
    000000006CD10744  mov         eax,dword ptr
    [SafepointSynchronize::_state (06CEB2988h)]
    000000006CD1074A  test        eax,eax
    000000006CD1074C  jne         Unsafe_SetNativeAddress+88h (06CD10758h)
    000000006CD1074E  mov         eax,dword ptr [rbx+30h]
    000000006CD10751  test        eax,30000000h
    000000006CD10756  je          Unsafe_SetNativeAddress+90h (06CD10760h)
    000000006CD10758  mov         rcx,rbx
    000000006CD1075B  call
    JavaThread::check_safepoint_and_suspend_for_native_trans (06CD53F20h)
    000000006CD10760  mov         dword ptr [rbx+258h],6
       UnsafeWrapper("Unsafe_SetNativeAddress");
       void* p = addr_from_java(addr);
       *(void**)p = addr_from_java(x);
    000000006CD1076A  mov         qword ptr [rsi],rdi <<<<<<<<<<<<<<<< CRASH
    UNSAFE_END

Notice that RBP is never used, so it keeps the old value of the caller 
(Java frame: sun.misc.Unsafe.putAddress). All Java frames in x64 still 
use the "push RBP; mov RBP, RSP" prolog, so they can be walked by the 
existing code.

I have generated my own stack traces (with jvm.dll symbols for Windows). 
If you compare the Windows and Linux versions:

*Linux
*    V  [libjvm.so+0x663d77]  Unsafe_SetNativeAddress+0x183
     j  sun.misc.Unsafe.putAddress(JJ)V+0<<<<---- missing from Windows
     j  sun.misc.Crash.main([Ljava/lang/String;)V+17
     v  ~StubRoutines::call_stub

*Windows*
     V  [jvm.dll+0x5d1420]  Unsafe_SetNativeAddress+0x140
     j  sun.misc.Crash.main([Ljava/lang/String;)V+17
     v  ~StubRoutines::call_stub

Note that the Windows version is missing sun.misc.Unsafe.putAddress. 
That's because the top most frame was actually printed using the RBP of 
the (Java) frame of sun.misc.Unsafe.putAddress, while using the native 
PC of Unsafe_SetNativeAddress.

If the scenario is more complicated (you have a few nested native calls, 
or if the native code overwrites RBP), then you will end up with a 
single native frame. You can see that by inserting crashes inside the 
class classLoader.cpp, etc.

So, with the existing code, it will give at most 1 native frame, plus a 
few Java frames (if you're lucky), but we already know the Java frames, 
so really it doesn't add any information.

With my patch, you will see the full native stack, up to the first Java 
frame. I think this has a lot of value for debugging (when MDMP files 
are not available).

What is still missing, are the frames below the first bunch of Java 
frames. To do this in all cases (e.g., when alloca() is used), I need to 
be able to recover all the non-volatile registers that are touched by 
the Java frames (because any non-volatile register could be a WinX64 
frame pointer!). This is probably doable, but I am not feeling brave 
enough to do this so late into JDK8.

BTW, StackWalk64() actually restores all the non-volatile registers 
(into the CONTEXT structure) as it unwinds the stack. It can do this 
because all the locations of the non-volatile registers are well 
specified in the PE32+ file format.See
http://www.codejury.com/a-walk-in-x64-land/

> >     so HotSpot's build-in native stack walking code
> >     will fail to find the sender frame. As a result, hs_err on 
> Windows/x64
> >     will always list a single frame.
> >
> >     I have added the os::platform_print_native_stack() function for
> >     Windows/x64 only. It uses the StackWalk64 API to walk the stace.
> >
> >     Because the Win/x64 frame layout is very different than what HotSpot
> > expects,
> >     I decided to implement os::platform_print_native_stack() as a 
> completely
> >     stand-alone function, and do not interact with the existing 
> "frame" C++
> > class.
> >     See comments in os_windows_x86.cpp for details.
> >
> > Deficiency of fix:
> >
> >     StackWalk64 knows nothing about the Java frames. So hs_err will 
> display
> > only
> >     the native frames, and stop as soon as the first Java frame is 
> reached.
> > It will
> >     also NOT display any native frames below Java frames.
> >
> >     Printing the Java frames *may* be possible. However, at this 
> point, I
> > want
> >     to keep the code simple and crash proof. I will file a different 
> bug for
> >     printing the Java frames.
> >
> > Bonus:
> >
> >     As a side-fix, I refactored a bunch of duplicated code in 
> decoder.cpp
> > into
> >     the DecoderLocker class.
> >
>
> I would really appreciate if you could do this in a separate change. I 
> think it is commonly agreed upon that "small unrelated fixes" (like 
> comments, white-space changes or single line fixes) may go into 
> another change but these changes are in IMHO big enough to justify a 
> different bug ID and change set.
>
Agreed. I will use the DecoderLocker only for my new code (to avoid 
cut-and-paste error of duplicated code blocks), but will not touch the 
existing code. I will do the clean up with a separate RFE.

> > Tests:
> >
> >     JPRT
> >     UTE (vm.runtime.testlist, vm.quick.testlist,
> > vm.parallel_class_loading.testlist)
> >
> >     I also manually inserted some crashes into jvm.dll and verified 
> that the
> >     native stack trace is printed as expected on Win/x64.
> >
>
> Maybe you can use/extend the WhiteBox API to write some good tests?
>
I'll try to this this.

Thanks!

- Ioi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130830/2b40cb95/attachment-0001.html 


More information about the hotspot-runtime-dev mailing list