RFR(M): 8133749, 8133747, 8133740: NMT detail stack trace cleanup

Chris Plummer chris.plummer at oracle.com
Fri Aug 5 18:10:47 UTC 2016


Hi Zhengyu,

I don't think skipping too many frames is a risk here. I only skip 
NativeCallStack::NativeCallStack from within 
NativeCallStack::NativeCallStack, and only do so for platforms where I 
have verified that it does not use a tail call to call 
os::get_native_stack. os::get_native_stack() needs to always skip 
itself, as does _get_previous_fp() when it is not being inlined. I did 
do a lot of manual verification of NMT detail output, comparing the 
original output with the new output. I did this on all platforms and for 
all 3 build flavors. The results looked to be correct.

There is a risk of needing to do further adjustments when there are 
compiler upgrades. The test I wrote should capture this need. In any 
case, I believe "fixed but fragile" is better than "broken and fragile".

thanks,

Chris

On 8/5/16 5:47 AM, Zhengyu Gu wrote:
> Hi Chris,
>
> The changes look good in general.
>
> However, I am not sure that, by skipping the two stack capturing 
> frames (NativeCallStack::NativeCallStack and os::get_native_stack), 
> you are not skipping actual callers in some places. If I recall 
> correctly, I saw inconsistent inlining behaviors in the same binaries. 
> For example, you don't see these two frames in all NMT call stacks.  I 
> also wonder if the changes can survive compiler upgrades.
>
> Thanks,
>
> -Zhengyu
>
>
> On 08/04/2016 05:53 PM, Chris Plummer wrote:
>> Ping!
>>
>> On 8/2/16 1:31 PM, Chris Plummer wrote:
>>> Hello,
>>>
>>> Please review the following:
>>>
>>> webrev: 
>>> http://cr.openjdk.java.net/~cjplummer/8133749-8133747-8133740/webrev-01/webrev.hotspot/
>>>
>>> Bugs fixed:
>>>
>>> JDK-8133749: os::current_frame() is not returning the proper frame 
>>> on ARM and solaris-x64
>>> https://bugs.openjdk.java.net/browse/JDK-8133749
>>>
>>> JDK-8133747: NMT includes an extra stack frame due to assumption NMT 
>>> is making on tail calls being used
>>> https://bugs.openjdk.java.net/browse/JDK-8133747
>>>
>>> JDK-8133740: NMT for Linux/x86/x64 and bsd/x64 slowdebug builds 
>>> includes NativeCallStack::NativeCallStack() frame in backtrace
>>> https://bugs.openjdk.java.net/browse/JDK-8133740
>>>
>>> The above bugs all result in the NMT detail stack traces including 
>>> extra frames in the stack traces. Certain frames are suppose to be 
>>> skipped, but sometimes are not. The frames that show up are:
>>>
>>>   NativeCallStack::NativeCallStack
>>>   os::get_native_stack
>>>
>>> These are both methods used to generate the stack trace, and 
>>> therefore should not be included it.  However, under some (most) 
>>> circumstances, they were.
>>>
>>> Also, there was no test to make sure that any NMT detail output is 
>>> generated, or that it is correct. I've added one with this webrev. 
>>> Of the 27 possible builds (9 platforms * 3 build flavors), only 9 of 
>>> the 27 initially passed this new test. They were the product and 
>>> fastdebug builds for solaris-sparc, bsd-x64, and linux-x64; and the 
>>> slowdebug builds for solaris-x64, windows-x86, and windows-x64. All 
>>> the rest failed. They now all pass with my fixes in place.
>>>
>>> Here's a summary of the changes:
>>>
>>> src/os/posix/vm/os_posix.cpp
>>> src/os/windows/vm/os_windows.cpp
>>>
>>> JDK-8133747 fixes: There was some frame skipping logic here which 
>>> was sort of correct, but was misplace. There are no extra frames 
>>> being added in os::get_native_stack() due to lack of inlining or 
>>> lack of a tail call, so no need for toSkip++ here. The logic has 
>>> been moved to NativeCallStack::NativeCallStack, which is where the 
>>> tail call is (sometimes) made, and also corrected (see 
>>> nativeCallStack.cpp below).
>>>
>>> src/share/vm/utilities/nativeCallStack.cpp
>>>
>>> JDK-8133747 fixes: The frame skipping logic that was moved here 
>>> assumed that NativeCallStack::NativeCallStack would not appear in 
>>> the call stack (due to a tail call be using to call 
>>> os::get_native_stack) except in slow debug builds. However, some 
>>> platforms also don't use a tail call even when optimized. From what 
>>> I can tell that is the case for 32-bit platforms and for windows.
>>>
>>> src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
>>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>>
>>> JDK-8133740 fixes: When _get_previous_fp is not inlined, we need to 
>>> skip one extra frame
>>>
>>> src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp
>>>
>>> JDK-8133749 fixes: os:current_frame() was not consistent with other 
>>> platforms and needs to skip one more frame. This means it returns 
>>> the frame for the caller's caller. So when called by 
>>> os:get_native_stack(), it returns the frame for whoever called 
>>> os::get_native_stack(). Although not intuitive, this is what 
>>> os:get_native_stack() expects. Probably a method rename and/or a 
>>> behavior change is justified here, but I would prefer to do that 
>>> with a followup CR if anyone has a good suggestion on what to do.
>>>
>>> test/runtime/NMT/CheckForProperDetailStackTrace.java
>>>
>>> This is the new NTM detail test. It checks for frames that shouldn't 
>>> be present and validates at least one stack trace is what is expected.
>>>
>>> I verified that the above test now passes on all supported 
>>> platforms, and also did a full jprt "-testset hotpot" run. I plan on 
>>> doing some RBT testing with NMT detail enabled before committing.
>>>
>>> Regarding the community contributed ports that Oracle does not 
>>> support, I didn't make any changes there, but it looks like some of 
>>> these bugs do exist. Notably:
>>>
>>> -linux-aarch64: Looks like it suffers from JDK-8133740. The changes 
>>> done to the
>>>  os_linux_x86.cp should also be applied here.
>>> -linux-ppc: Hard to say for sure since the implementation of 
>>> os::current_frame is
>>>  different than others, but it looks to me like it suffers from both 
>>> JDK-8133749
>>>  and JDK-8133740.
>>> -aix-ppc: Looks to be the same implementation as linux-ppc, so would 
>>> need the
>>>  same changes.
>>>
>>> These ports may also be suffering from JDK-8133747, but that fix is 
>>> in shared code (nativeCallStack.cpp). My changes there will need 
>>> some tweaking for these ports they don't use a tail call to call 
>>> os::get_native_stack().
>>>
>>> If the maintainers of these ports could send me some NMT detail 
>>> output, I can advise better on what changes are needed. Then you can 
>>> implement and test them, and then send them back to me and I'll 
>>> include them with my changes. What I need is the following command 
>>> run on product and slowdebug builds. Initially run without any of my 
>>> changes applied. If needed I may followup with a request that they 
>>> be run with the changes applied:
>>>
>>> bin/java -XX:+UnlockDiagnosticVMOptions 
>>> -XX:NativeMemoryTracking=detail -XX:+PrintNMTStatistics -version
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>
>



More information about the hotspot-runtime-dev mailing list