RFR(M): 8133749, 8133747, 8133740: NMT detail stack trace cleanup

Chris Plummer chris.plummer at oracle.com
Tue Aug 2 20:31:28 UTC 2016


Hello,

Please review the following:

webrev: 
http://cr.openjdk.java.net/~cjplummer/8133749-8133747-8133740/webrev-01/webrev.hotspot/

Bugs fixed:

JDK-8133749: os::current_frame() is not returning the proper frame on 
ARM and solaris-x64
https://bugs.openjdk.java.net/browse/JDK-8133749

JDK-8133747: NMT includes an extra stack frame due to assumption NMT is 
making on tail calls being used
https://bugs.openjdk.java.net/browse/JDK-8133747

JDK-8133740: NMT for Linux/x86/x64 and bsd/x64 slowdebug builds includes 
NativeCallStack::NativeCallStack() frame in backtrace
https://bugs.openjdk.java.net/browse/JDK-8133740

The above bugs all result in the NMT detail stack traces including extra 
frames in the stack traces. Certain frames are suppose to be skipped, 
but sometimes are not. The frames that show up are:

   NativeCallStack::NativeCallStack
   os::get_native_stack

These are both methods used to generate the stack trace, and therefore 
should not be included it.  However, under some (most) circumstances, 
they were.

Also, there was no test to make sure that any NMT detail output is 
generated, or that it is correct. I've added one with this webrev. Of 
the 27 possible builds (9 platforms * 3 build flavors), only 9 of the 27 
initially passed this new test. They were the product and fastdebug 
builds for solaris-sparc, bsd-x64, and linux-x64; and the slowdebug 
builds for solaris-x64, windows-x86, and windows-x64. All the rest 
failed. They now all pass with my fixes in place.

Here's a summary of the changes:

src/os/posix/vm/os_posix.cpp
src/os/windows/vm/os_windows.cpp

JDK-8133747 fixes: There was some frame skipping logic here which was 
sort of correct, but was misplace. There are no extra frames being added 
in os::get_native_stack() due to lack of inlining or lack of a tail 
call, so no need for toSkip++ here. The logic has been moved to 
NativeCallStack::NativeCallStack, which is where the tail call is 
(sometimes) made, and also corrected (see nativeCallStack.cpp below).

src/share/vm/utilities/nativeCallStack.cpp

JDK-8133747 fixes: The frame skipping logic that was moved here assumed 
that NativeCallStack::NativeCallStack would not appear in the call stack 
(due to a tail call be using to call os::get_native_stack) except in 
slow debug builds. However, some platforms also don't use a tail call 
even when optimized. From what I can tell that is the case for 32-bit 
platforms and for windows.

src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
src/os_cpu/windows_x86/vm/os_windows_x86.cpp
src/os_cpu/linux_x86/vm/os_linux_x86.cpp

JDK-8133740 fixes: When _get_previous_fp is not inlined, we need to skip 
one extra frame

src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp

JDK-8133749 fixes: os:current_frame() was not consistent with other 
platforms and needs to skip one more frame. This means it returns the 
frame for the caller's caller. So when called by os:get_native_stack(), 
it returns the frame for whoever called os::get_native_stack(). Although 
not intuitive, this is what os:get_native_stack() expects. Probably a 
method rename and/or a behavior change is justified here, but I would 
prefer to do that with a followup CR if anyone has a good suggestion on 
what to do.

test/runtime/NMT/CheckForProperDetailStackTrace.java

This is the new NTM detail test. It checks for frames that shouldn't be 
present and validates at least one stack trace is what is expected.

I verified that the above test now passes on all supported platforms, 
and also did a full jprt "-testset hotpot" run. I plan on doing some RBT 
testing with NMT detail enabled before committing.

Regarding the community contributed ports that Oracle does not support, 
I didn't make any changes there, but it looks like some of these bugs do 
exist. Notably:

-linux-aarch64: Looks like it suffers from JDK-8133740. The changes done 
to the
  os_linux_x86.cp should also be applied here.
-linux-ppc: Hard to say for sure since the implementation of 
os::current_frame is
  different than others, but it looks to me like it suffers from both 
JDK-8133749
  and JDK-8133740.
-aix-ppc: Looks to be the same implementation as linux-ppc, so would 
need the
  same changes.

These ports may also be suffering from JDK-8133747, but that fix is in 
shared code (nativeCallStack.cpp). My changes there will need some 
tweaking for these ports they don't use a tail call to call 
os::get_native_stack().

If the maintainers of these ports could send me some NMT detail output, 
I can advise better on what changes are needed. Then you can implement 
and test them, and then send them back to me and I'll include them with 
my changes. What I need is the following command run on product and 
slowdebug builds. Initially run without any of my changes applied. If 
needed I may followup with a request that they be run with the changes 
applied:

bin/java -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=detail 
-XX:+PrintNMTStatistics -version

thanks,

Chris



More information about the hotspot-runtime-dev mailing list