RFR (S): 8218458: [TESTBUG] runtime/NMT/CheckForProperDetailStackTrace.java fails with Expected stack trace missing from output
Chris Plummer
chris.plummer at oracle.com
Fri Apr 5 05:53:20 UTC 2019
Hi David,
For the callsite that this test is checking for, right now there appear
to be 3 possible stacktraces: the "normal" one, the one that includes
AllocateHeap() on solaris and windows slowdebug builds, and the one
Zhengyu is now seeing on linux-x64. You would need to check for all 3,
limiting the AllocateHeap() one to just being allowed on solaris and
windows slowdebug as it is now. So basically this test needs to cover
all (allowable) stacktraces that we've seen for this callsite, and be
updated in the future as needed. Not ideal, but I don't see a better
solution. It's similar to the situation described in JDK-8163899 which
covered the fragility of the NMT frame skipping code. In the end it was
decided it would be easier to just deal fix issues as they came up
rather then engineer a solution that wasn't as fragile. I think this
test falls in the same category.
thanks,
Chris
On 4/4/19 10:11 PM, David Holmes wrote:
> Hi Chris,
>
> Thanks for the explanation about the frame counting from os::malloc -
> now I get it. But I don't understand your final comment:
>
> > Looking at this code also reminds me of a reason to have the test
> > continue to check for all 4 specific frames. If the frame skipping code
> > skips an extra frame, then the callsite will be missing a needed frame
> > at the top. The way the test was written it would detect this. With
> your
> > changes it will not. It would just revert to always matching on 3
> frames
> > instead of 4, and the frame skipping bug would go unnoticed.
>
> How can I fix this bug if I have to check for 4 specific frames but
> one (or more) may be missing - i.e how can I tell the different
> between "Frame A was inlined" and "Frame A was skipped by mistake" ??
>
> Thanks,
> David
>
>
> On 5/04/2019 2:17 pm, Chris Plummer wrote:
>> Hi David,
>>
>> On 4/4/19 6:28 PM, David Holmes wrote:
>>> Hi Chris,
>>>
>>> On 5/04/2019 1:48 am, Chris Plummer wrote:
>>>> Hi David,
>>>>
>>>> On 4/4/19 12:14 AM, David Holmes wrote:
>>>>> On 4/04/2019 4:35 pm, Chris Plummer wrote:
>>>>>> On 4/3/19 11:23 PM, David Holmes wrote:
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> On 4/04/2019 4:12 pm, Chris Plummer wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> I have concerns that this will hide some of the other bugs I've
>>>>>>>> mentioned: JDK-8133749, JDK-8133747, and JDK-8133740. These
>>>>>>>> bugs result in 1 or two frames appearing in the stacktrace that
>>>>>>>> should be skipped. Notably NativeCallStack::NativeCallStack()
>>>>>>>> and os::get_native_stack().
>>>>>>>
>>>>>>> The test still checks those are not present first:
>>>>>>>
>>>>>>> 73 // We should never see either of these frames because
>>>>>>> they are supposed to be skipped. */
>>>>>>> 74 output.shouldNotContain("NativeCallStack::NativeCallStack");
>>>>>>> 75 output.shouldNotContain("os::get_native_stack");
>>>>>> Ah yes. I skimmed over the test looking for it but missed it.
>>>>>>>
>>>>>>>> Also, AllocateHeap() should normally not be in the stack trace,
>>>>>>>> but the test has specifically allowed for it for windows and
>>>>>>>> solaris slowdebug builds. Although these builds should have
>>>>>>>> honored the ALWAYSINLINE directive, it was deemed acceptable
>>>>>>>> that it was not in slowdebug builds. However, I would not want
>>>>>>>> to allow AllocateHeap() to appear in a product build, and best
>>>>>>>> not to see it in fastdebug either.
>>>>>>>
>>>>>>> This is a test of NMT detail not a test of whether a given
>>>>>>> compiler chooses to inline something like AllocateHeap. I don't
>>>>>>> think it is the job of this test to be checking for something
>>>>>>> specific to the native compiler. The previous handling of
>>>>>>> AllocateHeap seemed to be there simply because it was the only
>>>>>>> way to deal with an optional frame - but now that's handled
>>>>>>> generically.
>>>>>> It's appearance means you effectively only have 3 frames to
>>>>>> identity callsites instead of 4.
>>>>>
>>>>> Both stacktraces in the old test had 4 elements and expected 4
>>>>> matches. The current bug is that one of those (new_entry) could
>>>>> actually be inlined as well, resulting in only 3 matches. So that
>>>>> is what the revised test checks for: at least 3 matches. Often
>>>>> there will be 4 matches.
>>>> I think you misunderstood my "3 frames" comment. I was referring to
>>>> how many frames NMT uses to identify the callsite. It wants to use
>>>> 4, but if AllocateHeap() doesn't get inlined, it effectively is
>>>> using 3. The test should detect when this happens so the NMT
>>>> implementation can address the issue.
>>>
>>> You're right I don't understand this part as I don't know how/what
>>> NMT detail is doing in this regard.
>>
>> An NMT callsite is simply the 4 most recent frames (afters some
>> pruning) that led to the os:malloc() call. "4" is somewhat arbitrary
>> as Thomas pointed out, and is controlled by NMT_TrackingStackDepth.
>> Making NMT_TrackingStackDepth bigger means more refinement of the
>> callsites (thus more callsites), but a clearer picture of what
>> actually led to the os:malloc().
>>
>> For example, with NMT_TrackingStackDepth == 4, if you have a() calls
>> b() calls c() calls d() calls os:malloc(), and foo() and bar() both
>> call a(), the NMT detail output will not distinguish between these
>> two calls paths to os:mallco(), and will consider both paths to be
>> the same callsite. The 4 frames in the NMT detail output would always
>> be a, b, c, and d. However, bump up NMT_TrackingStackDepth to 5 and
>> now NMT will treat them as two separate callsites, one with foo() as
>> the bottom frame and one with bar() as the bottom frame, and both
>> with a, b, c, and d as the other 4 frames.
>>
>> So my point is if AllocateHeap() is not inlined, then every
>> allocation that is the result of doing a "new" of any CHeapObj
>> subtype will have AllocateHeap() in its callsite, which effectively
>> lowers they callsite refinement by 1.
>>
>>>
>>>>>
>>>>> Hmmm but now I'm wondering why this trace:
>>>>>
>>>>> 50 public static String stackTraceAllocateHeap =
>>>>> 51 ".*AllocateHeap.*\n" +
>>>>> 52 ".*ModuleEntryTable.*new_entry.*\n" +
>>>>>
>>>>> doesn't include ".*Hashtable.*allocate_new_entry.*"? Was it
>>>>> getting inlined already when AllocateHeap was not? Even so we
>>>>> still end up with 4 frames matching normally.
>>>> I noticed that last night also and scratch my head over it for a
>>>> while and then went to bed. The only explanation I could come up
>>>> with is that allocate_new_entry() is getting inlined, and as a
>>>> result (due to being a slowdebug build and doing minimal inlining)
>>>> AllocateHeap() was not inlined.
>>>>>
>>>>>> If it does appear in a product build, a solution should be looked
>>>>>> into to get rid of it. If the port owner decides it can't get rid
>>>>>> of it (or is unwilling to), then an exception should be added to
>>>>>> the test like was done for solaris and windows slowdebug builds.
>>>>>
>>>>> Are we specifically trying to test the compiler's ability to
>>>>> inline that function and just happen to be using this test to
>>>>> verify that? Doesn't seem like a suitable place to do this - and
>>>>> why do we need to do it? The Visual Studio docs state:
>>>>>
>>>>> "You cannot force the compiler to inline a particular function,
>>>>> even with the __forceinline keyword."
>>>>>
>>>>> so ALWAYSINLINE is just a hint even in product builds and could
>>>>> change with any update to the compiler.
>>>>>
>>>>> For Solaris Studio it is again not guaranteed to inline -
>>>>> specifically -xinline only has an effect at –xO3 or higher. Which
>>>>> likely explains why it is ignored in slowdebug. And there are
>>>>> other cases where it won't honour the ALWAYSINLINE.
>>>>>
>>>>> Even with gcc we seem to be misusing the attribute if we want to
>>>>> ensure inlining when not optimising:
>>>>>
>>>>> "GCC does not inline any functions when not optimizing unless you
>>>>> specify the ‘always_inline’ attribute for the function, like this:
>>>>>
>>>>> /* Prototype. */
>>>>> inline void foo (const char) __attribute__((always_inline));"
>>>>>
>>>>> and we don't write it that way.
>>>>>
>>>>> So if we're that concerned about release builds guaranteeing to
>>>>> inline AllocateHeap then I think we need something a bit more
>>>>> explicit than this test to determine that.
>>>> With respect to the 3 methods/functions we don't want to see in the
>>>> callsite stacktrace, NMT has made a number of assumptions on
>>>> inlining. One of the things the test is doing is making sure those
>>>> assumptions are correct. If incorrect, then you run into issues
>>>> like I mentioned above where callsite backtraces effectively only
>>>> have 3 unique frames rather than 4 (actually before some bug fixes
>>>> it was often just 2 unique frames). So I think it's appropriate to
>>>> have a test to make sure we are not seeing any of these 3
>>>> methods/functions.
>>>
>>> Okay I get the gist of that. Is there somewhere I can clearly see
>>> what this inlining assumptions are that NMT makes? Are they clearly
>>> documented?
>>
>> Not that I know of. I discovered them while looking at the various
>> bugs that led to NativeCallStack::NativeCallStack() and
>> os::get_native_stack() (and sometimes both) being in the callsite.
>> Reviewing the bugs I referred to will give you an idea of where to
>> look. One good place to look at NativeCallStack::NativeCallStack().
>> Lots of special case code there that controls how many frames to skip
>> based on on the platform and whether optimized or not. Also some
>> comments there to help you out. I did a lot of bug fixing in this
>> method.
>>
>> Looking at this code also reminds me of a reason to have the test
>> continue to check for all 4 specific frames. If the frame skipping
>> code skips an extra frame, then the callsite will be missing a needed
>> frame at the top. The way the test was written it would detect this.
>> With your changes it will not. It would just revert to always
>> matching on 3 frames instead of 4, and the frame skipping bug would
>> go unnoticed.
>>
>> thanks,
>>
>> Chris
>>
>>>
>>>> Now the test also has made inlining assumptions beyond what NMT has
>>>> made, and that is really what this bug is about. In general I think
>>>> your fix is fine in the way it relaxes which frames are actually
>>>> found, but as Thomas points out, it suffers from not actually
>>>> looking at a single stacktrace, but just looking for the specified
>>>> frames somewhere in the output (and in the order specified.) You
>>>> should probably address this.
>>>
>>> Right that was an error on my part. I thought the existing MULTILINE
>>> pattern matching with .* would also find non-sequential lines and so
>>> I was acting similarly. I will re-think this.
>>>
>>> Thanks,
>>> David
>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> Given the changes you made to allow more flexibly in which
>>>>>>>> frames appear, I think you need to now also make sure the above
>>>>>>>> 3 mentioned frames are not present, except for allowing
>>>>>>>> AllocateHeap() in slowdebug builds.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On 4/3/19 10:53 PM, David Holmes wrote:
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218458
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8218458/webrev/
>>>>>>>>>
>>>>>>>>> The actual stack trace reported by NMT detail is affected by
>>>>>>>>> the inlining decisions of the native compiler, and on the type
>>>>>>>>> of build. So we define an "ideal" stacktrace and then allow
>>>>>>>>> for some frames to be missing based on empirical observations.
>>>>>>>>> So to date we have seen two frames that may or may not be
>>>>>>>>> inlined and so we allow for 2 non-matching entries.
>>>>>>>>>
>>>>>>>>> The special-casing of AllocateHeap is removed as now it is
>>>>>>>>> just an optional frame.
>>>>>>>>>
>>>>>>>>> Chris: does this maintain the "spirit" of the test as you
>>>>>>>>> intended?
>>>>>>>>>
>>>>>>>>> Zhengyu: can you test this on your system(s) please.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
More information about the hotspot-runtime-dev
mailing list