RFR(M): 8133749, 8133747, 8133740: NMT detail stack trace cleanup

Zhengyu Gu zgu at redhat.com
Fri Aug 5 12:47:06 UTC 2016


Hi Chris,

The changes look good in general.

However, I am not sure that, by skipping the two stack capturing frames 
(NativeCallStack::NativeCallStack and os::get_native_stack), you are not 
skipping actual callers in some places. If I recall correctly, I saw 
inconsistent inlining behaviors in the same binaries. For example, you 
don't see these two frames in all NMT call stacks.  I also wonder if the 
changes can survive compiler upgrades.

Thanks,

-Zhengyu


On 08/04/2016 05:53 PM, Chris Plummer wrote:
> Ping!
>
> On 8/2/16 1:31 PM, Chris Plummer wrote:
>> Hello,
>>
>> Please review the following:
>>
>> webrev: 
>> http://cr.openjdk.java.net/~cjplummer/8133749-8133747-8133740/webrev-01/webrev.hotspot/
>>
>> Bugs fixed:
>>
>> JDK-8133749: os::current_frame() is not returning the proper frame on 
>> ARM and solaris-x64
>> https://bugs.openjdk.java.net/browse/JDK-8133749
>>
>> JDK-8133747: NMT includes an extra stack frame due to assumption NMT 
>> is making on tail calls being used
>> https://bugs.openjdk.java.net/browse/JDK-8133747
>>
>> JDK-8133740: NMT for Linux/x86/x64 and bsd/x64 slowdebug builds 
>> includes NativeCallStack::NativeCallStack() frame in backtrace
>> https://bugs.openjdk.java.net/browse/JDK-8133740
>>
>> The above bugs all result in the NMT detail stack traces including 
>> extra frames in the stack traces. Certain frames are suppose to be 
>> skipped, but sometimes are not. The frames that show up are:
>>
>>   NativeCallStack::NativeCallStack
>>   os::get_native_stack
>>
>> These are both methods used to generate the stack trace, and 
>> therefore should not be included it.  However, under some (most) 
>> circumstances, they were.
>>
>> Also, there was no test to make sure that any NMT detail output is 
>> generated, or that it is correct. I've added one with this webrev. Of 
>> the 27 possible builds (9 platforms * 3 build flavors), only 9 of the 
>> 27 initially passed this new test. They were the product and 
>> fastdebug builds for solaris-sparc, bsd-x64, and linux-x64; and the 
>> slowdebug builds for solaris-x64, windows-x86, and windows-x64. All 
>> the rest failed. They now all pass with my fixes in place.
>>
>> Here's a summary of the changes:
>>
>> src/os/posix/vm/os_posix.cpp
>> src/os/windows/vm/os_windows.cpp
>>
>> JDK-8133747 fixes: There was some frame skipping logic here which was 
>> sort of correct, but was misplace. There are no extra frames being 
>> added in os::get_native_stack() due to lack of inlining or lack of a 
>> tail call, so no need for toSkip++ here. The logic has been moved to 
>> NativeCallStack::NativeCallStack, which is where the tail call is 
>> (sometimes) made, and also corrected (see nativeCallStack.cpp below).
>>
>> src/share/vm/utilities/nativeCallStack.cpp
>>
>> JDK-8133747 fixes: The frame skipping logic that was moved here 
>> assumed that NativeCallStack::NativeCallStack would not appear in the 
>> call stack (due to a tail call be using to call os::get_native_stack) 
>> except in slow debug builds. However, some platforms also don't use a 
>> tail call even when optimized. From what I can tell that is the case 
>> for 32-bit platforms and for windows.
>>
>> src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>
>> JDK-8133740 fixes: When _get_previous_fp is not inlined, we need to 
>> skip one extra frame
>>
>> src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp
>>
>> JDK-8133749 fixes: os:current_frame() was not consistent with other 
>> platforms and needs to skip one more frame. This means it returns the 
>> frame for the caller's caller. So when called by 
>> os:get_native_stack(), it returns the frame for whoever called 
>> os::get_native_stack(). Although not intuitive, this is what 
>> os:get_native_stack() expects. Probably a method rename and/or a 
>> behavior change is justified here, but I would prefer to do that with 
>> a followup CR if anyone has a good suggestion on what to do.
>>
>> test/runtime/NMT/CheckForProperDetailStackTrace.java
>>
>> This is the new NTM detail test. It checks for frames that shouldn't 
>> be present and validates at least one stack trace is what is expected.
>>
>> I verified that the above test now passes on all supported platforms, 
>> and also did a full jprt "-testset hotpot" run. I plan on doing some 
>> RBT testing with NMT detail enabled before committing.
>>
>> Regarding the community contributed ports that Oracle does not 
>> support, I didn't make any changes there, but it looks like some of 
>> these bugs do exist. Notably:
>>
>> -linux-aarch64: Looks like it suffers from JDK-8133740. The changes 
>> done to the
>>  os_linux_x86.cp should also be applied here.
>> -linux-ppc: Hard to say for sure since the implementation of 
>> os::current_frame is
>>  different than others, but it looks to me like it suffers from both 
>> JDK-8133749
>>  and JDK-8133740.
>> -aix-ppc: Looks to be the same implementation as linux-ppc, so would 
>> need the
>>  same changes.
>>
>> These ports may also be suffering from JDK-8133747, but that fix is 
>> in shared code (nativeCallStack.cpp). My changes there will need some 
>> tweaking for these ports they don't use a tail call to call 
>> os::get_native_stack().
>>
>> If the maintainers of these ports could send me some NMT detail 
>> output, I can advise better on what changes are needed. Then you can 
>> implement and test them, and then send them back to me and I'll 
>> include them with my changes. What I need is the following command 
>> run on product and slowdebug builds. Initially run without any of my 
>> changes applied. If needed I may followup with a request that they be 
>> run with the changes applied:
>>
>> bin/java -XX:+UnlockDiagnosticVMOptions 
>> -XX:NativeMemoryTracking=detail -XX:+PrintNMTStatistics -version
>>
>> thanks,
>>
>> Chris
>>
>



More information about the hotspot-runtime-dev mailing list