RFR(M): 8133749, 8133747, 8133740: NMT detail stack trace cleanup

Chris Plummer chris.plummer at oracle.com
Thu Aug 18 19:42:55 UTC 2016


Hi Dmitry,

These changes were already pushed last week. Chris was just responding 
to a request I had to see if any of the open ports also needed any of 
the fixes that were done for other platforms.

On 8/18/16 9:18 AM, Dmitry Samersoff wrote:
> Chris,
>
> 1. (general) intptr_t* _get_previous_fp()
>
> I'm not sure we should rely on the fact that _get_previous_fp is
> inlined.
It would be nice to not rely on gcc for inlining (or not inlining), but 
unfortunately the main theme of these bugs is fixing cases were we are 
assuming, and the assumption is wrong. The following CR was filed to 
address in general the fragility of relying on what compilers are doing 
for inlining:

JDK-8163899 NMT frame skipping code is fragile
> AFAIK, gcc doesn't inline function if it has inline assembly.

That wasn't my observation. If it were true, the test wouldn't be 
passing on both debug and non-debug builds.

>
> So it might be better to mark it explicitly by
> __attribute__ ((noinline))
I think what actually would be best is to get rid of _get_previous_fp() 
and move it's logic into os:current_frame(). That's pointed out in 
another related CR I filed:

JDK-8163900 os::current_frame has a misleading name
>
> 2. os_solaris_x86.cpp:299
>
> Should we check os::is_first_C_frame(&myframe) before asking for caller
> frame ?
The other platforms don't do this. At best maybe they all should do so 
as an assert to make sure the logic isn't flawed. If the logic is 
flawed, it will crash, so the assert would just catch the reason earlier.

thanks,

Chris
>
> -Dmitry
>
>
> On 2016-08-02 23:31, Chris Plummer wrote:
>> Hello,
>>
>> Please review the following:
>>
>> webrev:
>> http://cr.openjdk.java.net/~cjplummer/8133749-8133747-8133740/webrev-01/webrev.hotspot/
>>
>>
>> Bugs fixed:
>>
>> JDK-8133749: os::current_frame() is not returning the proper frame on
>> ARM and solaris-x64
>> https://bugs.openjdk.java.net/browse/JDK-8133749
>>
>> JDK-8133747: NMT includes an extra stack frame due to assumption NMT is
>> making on tail calls being used
>> https://bugs.openjdk.java.net/browse/JDK-8133747
>>
>> JDK-8133740: NMT for Linux/x86/x64 and bsd/x64 slowdebug builds includes
>> NativeCallStack::NativeCallStack() frame in backtrace
>> https://bugs.openjdk.java.net/browse/JDK-8133740
>>
>> The above bugs all result in the NMT detail stack traces including extra
>> frames in the stack traces. Certain frames are suppose to be skipped,
>> but sometimes are not. The frames that show up are:
>>
>>    NativeCallStack::NativeCallStack
>>    os::get_native_stack
>>
>> These are both methods used to generate the stack trace, and therefore
>> should not be included it.  However, under some (most) circumstances,
>> they were.
>>
>> Also, there was no test to make sure that any NMT detail output is
>> generated, or that it is correct. I've added one with this webrev. Of
>> the 27 possible builds (9 platforms * 3 build flavors), only 9 of the 27
>> initially passed this new test. They were the product and fastdebug
>> builds for solaris-sparc, bsd-x64, and linux-x64; and the slowdebug
>> builds for solaris-x64, windows-x86, and windows-x64. All the rest
>> failed. They now all pass with my fixes in place.
>>
>> Here's a summary of the changes:
>>
>> src/os/posix/vm/os_posix.cpp
>> src/os/windows/vm/os_windows.cpp
>>
>> JDK-8133747 fixes: There was some frame skipping logic here which was
>> sort of correct, but was misplace. There are no extra frames being added
>> in os::get_native_stack() due to lack of inlining or lack of a tail
>> call, so no need for toSkip++ here. The logic has been moved to
>> NativeCallStack::NativeCallStack, which is where the tail call is
>> (sometimes) made, and also corrected (see nativeCallStack.cpp below).
>>
>> src/share/vm/utilities/nativeCallStack.cpp
>>
>> JDK-8133747 fixes: The frame skipping logic that was moved here assumed
>> that NativeCallStack::NativeCallStack would not appear in the call stack
>> (due to a tail call be using to call os::get_native_stack) except in
>> slow debug builds. However, some platforms also don't use a tail call
>> even when optimized. From what I can tell that is the case for 32-bit
>> platforms and for windows.
>>
>> src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>
>> JDK-8133740 fixes: When _get_previous_fp is not inlined, we need to skip
>> one extra frame
>>
>> src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp
>>
>> JDK-8133749 fixes: os:current_frame() was not consistent with other
>> platforms and needs to skip one more frame. This means it returns the
>> frame for the caller's caller. So when called by os:get_native_stack(),
>> it returns the frame for whoever called os::get_native_stack(). Although
>> not intuitive, this is what os:get_native_stack() expects. Probably a
>> method rename and/or a behavior change is justified here, but I would
>> prefer to do that with a followup CR if anyone has a good suggestion on
>> what to do.
>>
>> test/runtime/NMT/CheckForProperDetailStackTrace.java
>>
>> This is the new NTM detail test. It checks for frames that shouldn't be
>> present and validates at least one stack trace is what is expected.
>>
>> I verified that the above test now passes on all supported platforms,
>> and also did a full jprt "-testset hotpot" run. I plan on doing some RBT
>> testing with NMT detail enabled before committing.
>>
>> Regarding the community contributed ports that Oracle does not support,
>> I didn't make any changes there, but it looks like some of these bugs do
>> exist. Notably:
>>
>> -linux-aarch64: Looks like it suffers from JDK-8133740. The changes done
>> to the
>>   os_linux_x86.cp should also be applied here.
>> -linux-ppc: Hard to say for sure since the implementation of
>> os::current_frame is
>>   different than others, but it looks to me like it suffers from both
>> JDK-8133749
>>   and JDK-8133740.
>> -aix-ppc: Looks to be the same implementation as linux-ppc, so would
>> need the
>>   same changes.
>>
>> These ports may also be suffering from JDK-8133747, but that fix is in
>> shared code (nativeCallStack.cpp). My changes there will need some
>> tweaking for these ports they don't use a tail call to call
>> os::get_native_stack().
>>
>> If the maintainers of these ports could send me some NMT detail output,
>> I can advise better on what changes are needed. Then you can implement
>> and test them, and then send them back to me and I'll include them with
>> my changes. What I need is the following command run on product and
>> slowdebug builds. Initially run without any of my changes applied. If
>> needed I may followup with a request that they be run with the changes
>> applied:
>>
>> bin/java -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=detail
>> -XX:+PrintNMTStatistics -version
>>
>> thanks,
>>
>> Chris
>>
>



More information about the hotspot-runtime-dev mailing list