RFR(M): 8133749, 8133747, 8133740: NMT detail stack trace cleanup
dean.long at oracle.com
dean.long at oracle.com
Thu Aug 18 18:26:41 UTC 2016
For gcc, would it be useful to use __builtin_frame_address()?
On 8/18/16 9:18 AM, Dmitry Samersoff wrote:
> 1. (general) intptr_t* _get_previous_fp()
> I'm not sure we should rely on the fact that _get_previous_fp is
> inlined. AFAIK, gcc doesn't inline function if it has inline assembly.
> So it might be better to mark it explicitly by
> __attribute__ ((noinline))
> 2. os_solaris_x86.cpp:299
> Should we check os::is_first_C_frame(&myframe) before asking for caller
> frame ?
> On 2016-08-02 23:31, Chris Plummer wrote:
>> Please review the following:
>> Bugs fixed:
>> JDK-8133749: os::current_frame() is not returning the proper frame on
>> ARM and solaris-x64
>> JDK-8133747: NMT includes an extra stack frame due to assumption NMT is
>> making on tail calls being used
>> JDK-8133740: NMT for Linux/x86/x64 and bsd/x64 slowdebug builds includes
>> NativeCallStack::NativeCallStack() frame in backtrace
>> The above bugs all result in the NMT detail stack traces including extra
>> frames in the stack traces. Certain frames are suppose to be skipped,
>> but sometimes are not. The frames that show up are:
>> These are both methods used to generate the stack trace, and therefore
>> should not be included it. However, under some (most) circumstances,
>> they were.
>> Also, there was no test to make sure that any NMT detail output is
>> generated, or that it is correct. I've added one with this webrev. Of
>> the 27 possible builds (9 platforms * 3 build flavors), only 9 of the 27
>> initially passed this new test. They were the product and fastdebug
>> builds for solaris-sparc, bsd-x64, and linux-x64; and the slowdebug
>> builds for solaris-x64, windows-x86, and windows-x64. All the rest
>> failed. They now all pass with my fixes in place.
>> Here's a summary of the changes:
>> JDK-8133747 fixes: There was some frame skipping logic here which was
>> sort of correct, but was misplace. There are no extra frames being added
>> in os::get_native_stack() due to lack of inlining or lack of a tail
>> call, so no need for toSkip++ here. The logic has been moved to
>> NativeCallStack::NativeCallStack, which is where the tail call is
>> (sometimes) made, and also corrected (see nativeCallStack.cpp below).
>> JDK-8133747 fixes: The frame skipping logic that was moved here assumed
>> that NativeCallStack::NativeCallStack would not appear in the call stack
>> (due to a tail call be using to call os::get_native_stack) except in
>> slow debug builds. However, some platforms also don't use a tail call
>> even when optimized. From what I can tell that is the case for 32-bit
>> platforms and for windows.
>> JDK-8133740 fixes: When _get_previous_fp is not inlined, we need to skip
>> one extra frame
>> JDK-8133749 fixes: os:current_frame() was not consistent with other
>> platforms and needs to skip one more frame. This means it returns the
>> frame for the caller's caller. So when called by os:get_native_stack(),
>> it returns the frame for whoever called os::get_native_stack(). Although
>> not intuitive, this is what os:get_native_stack() expects. Probably a
>> method rename and/or a behavior change is justified here, but I would
>> prefer to do that with a followup CR if anyone has a good suggestion on
>> what to do.
>> This is the new NTM detail test. It checks for frames that shouldn't be
>> present and validates at least one stack trace is what is expected.
>> I verified that the above test now passes on all supported platforms,
>> and also did a full jprt "-testset hotpot" run. I plan on doing some RBT
>> testing with NMT detail enabled before committing.
>> Regarding the community contributed ports that Oracle does not support,
>> I didn't make any changes there, but it looks like some of these bugs do
>> exist. Notably:
>> -linux-aarch64: Looks like it suffers from JDK-8133740. The changes done
>> to the
>> os_linux_x86.cp should also be applied here.
>> -linux-ppc: Hard to say for sure since the implementation of
>> os::current_frame is
>> different than others, but it looks to me like it suffers from both
>> and JDK-8133740.
>> -aix-ppc: Looks to be the same implementation as linux-ppc, so would
>> need the
>> same changes.
>> These ports may also be suffering from JDK-8133747, but that fix is in
>> shared code (nativeCallStack.cpp). My changes there will need some
>> tweaking for these ports they don't use a tail call to call
>> If the maintainers of these ports could send me some NMT detail output,
>> I can advise better on what changes are needed. Then you can implement
>> and test them, and then send them back to me and I'll include them with
>> my changes. What I need is the following command run on product and
>> slowdebug builds. Initially run without any of my changes applied. If
>> needed I may followup with a request that they be run with the changes
>> bin/java -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=detail
>> -XX:+PrintNMTStatistics -version
More information about the hotspot-runtime-dev