RFR (S): 8218458: [TESTBUG] runtime/NMT/CheckForProperDetailStackTrace.java fails with Expected stack trace missing from output
Zhengyu Gu
zgu at redhat.com
Fri Apr 5 13:05:28 UTC 2019
Hi David,
On 4/5/19 3:04 AM, David Holmes wrote:
> Hi Chris,
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~dholmes/8218458/webrev.v2/
>
> Checks for alternate stack now. Added lots of comments and misc fixups.
>
> Zhengyu: please re-test (I can't test any slowdebug except linux-x64).
It passed all three configurations (release, fastdebug and slowdebug)
Thanks,
-Zhengyu
>
> Thanks,
> David
>
> On 5/04/2019 4:01 pm, Chris Plummer wrote:
>> Thinking about this a bit more, there is still the potential for some
>> confusion if this test fails again in the future due to the top frame
>> missing. Is it missing because it got inlined or is it missing because
>> the frame skipping code skipped an extra frame? Hopefully whoever
>> deals with it doesn't just hastily add another valid stacktrace to the
>> test but instead investigates to make sure the issue is indeed that
>> the method got inlined.
>>
>> Chris
>>
>> On 4/4/19 10:56 PM, David Holmes wrote:
>>> Hi Chris,
>>>
>>> Okay I will simply check for the third alternative.
>>>
>>> Thanks,
>>> David
>>>
>>> On 5/04/2019 3:53 pm, Chris Plummer wrote:
>>>> Hi David,
>>>>
>>>> For the callsite that this test is checking for, right now there
>>>> appear to be 3 possible stacktraces: the "normal" one, the one that
>>>> includes AllocateHeap() on solaris and windows slowdebug builds, and
>>>> the one Zhengyu is now seeing on linux-x64. You would need to check
>>>> for all 3, limiting the AllocateHeap() one to just being allowed on
>>>> solaris and windows slowdebug as it is now. So basically this test
>>>> needs to cover all (allowable) stacktraces that we've seen for this
>>>> callsite, and be updated in the future as needed. Not ideal, but I
>>>> don't see a better solution. It's similar to the situation described
>>>> in JDK-8163899 which covered the fragility of the NMT frame skipping
>>>> code. In the end it was decided it would be easier to just deal fix
>>>> issues as they came up rather then engineer a solution that wasn't
>>>> as fragile. I think this test falls in the same category.
>>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>
>>>> On 4/4/19 10:11 PM, David Holmes wrote:
>>>>> Hi Chris,
>>>>>
>>>>> Thanks for the explanation about the frame counting from os::malloc
>>>>> - now I get it. But I don't understand your final comment:
>>>>>
>>>>> > Looking at this code also reminds me of a reason to have the test
>>>>> > continue to check for all 4 specific frames. If the frame
>>>>> skipping code
>>>>> > skips an extra frame, then the callsite will be missing a needed
>>>>> frame
>>>>> > at the top. The way the test was written it would detect this.
>>>>> With your
>>>>> > changes it will not. It would just revert to always matching on 3
>>>>> frames
>>>>> > instead of 4, and the frame skipping bug would go unnoticed.
>>>>>
>>>>> How can I fix this bug if I have to check for 4 specific frames but
>>>>> one (or more) may be missing - i.e how can I tell the different
>>>>> between "Frame A was inlined" and "Frame A was skipped by mistake" ??
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>> On 5/04/2019 2:17 pm, Chris Plummer wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 4/4/19 6:28 PM, David Holmes wrote:
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> On 5/04/2019 1:48 am, Chris Plummer wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> On 4/4/19 12:14 AM, David Holmes wrote:
>>>>>>>>> On 4/04/2019 4:35 pm, Chris Plummer wrote:
>>>>>>>>>> On 4/3/19 11:23 PM, David Holmes wrote:
>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>
>>>>>>>>>>> On 4/04/2019 4:12 pm, Chris Plummer wrote:
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>> I have concerns that this will hide some of the other bugs
>>>>>>>>>>>> I've mentioned: JDK-8133749, JDK-8133747, and JDK-8133740.
>>>>>>>>>>>> These bugs result in 1 or two frames appearing in the
>>>>>>>>>>>> stacktrace that should be skipped. Notably
>>>>>>>>>>>> NativeCallStack::NativeCallStack() and os::get_native_stack().
>>>>>>>>>>>
>>>>>>>>>>> The test still checks those are not present first:
>>>>>>>>>>>
>>>>>>>>>>> 73 // We should never see either of these frames
>>>>>>>>>>> because they are supposed to be skipped. */
>>>>>>>>>>> 74 output.shouldNotContain("NativeCallStack::NativeCallStack");
>>>>>>>>>>> 75 output.shouldNotContain("os::get_native_stack");
>>>>>>>>>> Ah yes. I skimmed over the test looking for it but missed it.
>>>>>>>>>>>
>>>>>>>>>>>> Also, AllocateHeap() should normally not be in the stack
>>>>>>>>>>>> trace, but the test has specifically allowed for it for
>>>>>>>>>>>> windows and solaris slowdebug builds. Although these builds
>>>>>>>>>>>> should have honored the ALWAYSINLINE directive, it was
>>>>>>>>>>>> deemed acceptable that it was not in slowdebug builds.
>>>>>>>>>>>> However, I would not want to allow AllocateHeap() to appear
>>>>>>>>>>>> in a product build, and best not to see it in fastdebug either.
>>>>>>>>>>>
>>>>>>>>>>> This is a test of NMT detail not a test of whether a given
>>>>>>>>>>> compiler chooses to inline something like AllocateHeap. I
>>>>>>>>>>> don't think it is the job of this test to be checking for
>>>>>>>>>>> something specific to the native compiler. The previous
>>>>>>>>>>> handling of AllocateHeap seemed to be there simply because it
>>>>>>>>>>> was the only way to deal with an optional frame - but now
>>>>>>>>>>> that's handled generically.
>>>>>>>>>> It's appearance means you effectively only have 3 frames to
>>>>>>>>>> identity callsites instead of 4.
>>>>>>>>>
>>>>>>>>> Both stacktraces in the old test had 4 elements and expected 4
>>>>>>>>> matches. The current bug is that one of those (new_entry) could
>>>>>>>>> actually be inlined as well, resulting in only 3 matches. So
>>>>>>>>> that is what the revised test checks for: at least 3 matches.
>>>>>>>>> Often there will be 4 matches.
>>>>>>>> I think you misunderstood my "3 frames" comment. I was referring
>>>>>>>> to how many frames NMT uses to identify the callsite. It wants
>>>>>>>> to use 4, but if AllocateHeap() doesn't get inlined, it
>>>>>>>> effectively is using 3. The test should detect when this happens
>>>>>>>> so the NMT implementation can address the issue.
>>>>>>>
>>>>>>> You're right I don't understand this part as I don't know
>>>>>>> how/what NMT detail is doing in this regard.
>>>>>>
>>>>>> An NMT callsite is simply the 4 most recent frames (afters some
>>>>>> pruning) that led to the os:malloc() call. "4" is somewhat
>>>>>> arbitrary as Thomas pointed out, and is controlled by
>>>>>> NMT_TrackingStackDepth. Making NMT_TrackingStackDepth bigger means
>>>>>> more refinement of the callsites (thus more callsites), but a
>>>>>> clearer picture of what actually led to the os:malloc().
>>>>>>
>>>>>> For example, with NMT_TrackingStackDepth == 4, if you have a()
>>>>>> calls b() calls c() calls d() calls os:malloc(), and foo() and
>>>>>> bar() both call a(), the NMT detail output will not distinguish
>>>>>> between these two calls paths to os:mallco(), and will consider
>>>>>> both paths to be the same callsite. The 4 frames in the NMT detail
>>>>>> output would always be a, b, c, and d. However, bump up
>>>>>> NMT_TrackingStackDepth to 5 and now NMT will treat them as two
>>>>>> separate callsites, one with foo() as the bottom frame and one
>>>>>> with bar() as the bottom frame, and both with a, b, c, and d as
>>>>>> the other 4 frames.
>>>>>>
>>>>>> So my point is if AllocateHeap() is not inlined, then every
>>>>>> allocation that is the result of doing a "new" of any CHeapObj
>>>>>> subtype will have AllocateHeap() in its callsite, which
>>>>>> effectively lowers they callsite refinement by 1.
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Hmmm but now I'm wondering why this trace:
>>>>>>>>>
>>>>>>>>> 50 public static String stackTraceAllocateHeap =
>>>>>>>>> 51 ".*AllocateHeap.*\n" +
>>>>>>>>> 52 ".*ModuleEntryTable.*new_entry.*\n" +
>>>>>>>>>
>>>>>>>>> doesn't include ".*Hashtable.*allocate_new_entry.*"? Was it
>>>>>>>>> getting inlined already when AllocateHeap was not? Even so we
>>>>>>>>> still end up with 4 frames matching normally.
>>>>>>>> I noticed that last night also and scratch my head over it for a
>>>>>>>> while and then went to bed. The only explanation I could come up
>>>>>>>> with is that allocate_new_entry() is getting inlined, and as a
>>>>>>>> result (due to being a slowdebug build and doing minimal
>>>>>>>> inlining) AllocateHeap() was not inlined.
>>>>>>>>>
>>>>>>>>>> If it does appear in a product build, a solution should be
>>>>>>>>>> looked into to get rid of it. If the port owner decides it
>>>>>>>>>> can't get rid of it (or is unwilling to), then an exception
>>>>>>>>>> should be added to the test like was done for solaris and
>>>>>>>>>> windows slowdebug builds.
>>>>>>>>>
>>>>>>>>> Are we specifically trying to test the compiler's ability to
>>>>>>>>> inline that function and just happen to be using this test to
>>>>>>>>> verify that? Doesn't seem like a suitable place to do this -
>>>>>>>>> and why do we need to do it? The Visual Studio docs state:
>>>>>>>>>
>>>>>>>>> "You cannot force the compiler to inline a particular function,
>>>>>>>>> even with the __forceinline keyword."
>>>>>>>>>
>>>>>>>>> so ALWAYSINLINE is just a hint even in product builds and could
>>>>>>>>> change with any update to the compiler.
>>>>>>>>>
>>>>>>>>> For Solaris Studio it is again not guaranteed to inline -
>>>>>>>>> specifically -xinline only has an effect at –xO3 or higher.
>>>>>>>>> Which likely explains why it is ignored in slowdebug. And there
>>>>>>>>> are other cases where it won't honour the ALWAYSINLINE.
>>>>>>>>>
>>>>>>>>> Even with gcc we seem to be misusing the attribute if we want
>>>>>>>>> to ensure inlining when not optimising:
>>>>>>>>>
>>>>>>>>> "GCC does not inline any functions when not optimizing unless
>>>>>>>>> you specify the ‘always_inline’ attribute for the function,
>>>>>>>>> like this:
>>>>>>>>>
>>>>>>>>> /* Prototype. */
>>>>>>>>> inline void foo (const char) __attribute__((always_inline));"
>>>>>>>>>
>>>>>>>>> and we don't write it that way.
>>>>>>>>>
>>>>>>>>> So if we're that concerned about release builds guaranteeing to
>>>>>>>>> inline AllocateHeap then I think we need something a bit more
>>>>>>>>> explicit than this test to determine that.
>>>>>>>> With respect to the 3 methods/functions we don't want to see in
>>>>>>>> the callsite stacktrace, NMT has made a number of assumptions on
>>>>>>>> inlining. One of the things the test is doing is making sure
>>>>>>>> those assumptions are correct. If incorrect, then you run into
>>>>>>>> issues like I mentioned above where callsite backtraces
>>>>>>>> effectively only have 3 unique frames rather than 4 (actually
>>>>>>>> before some bug fixes it was often just 2 unique frames). So I
>>>>>>>> think it's appropriate to have a test to make sure we are not
>>>>>>>> seeing any of these 3 methods/functions.
>>>>>>>
>>>>>>> Okay I get the gist of that. Is there somewhere I can clearly see
>>>>>>> what this inlining assumptions are that NMT makes? Are they
>>>>>>> clearly documented?
>>>>>>
>>>>>> Not that I know of. I discovered them while looking at the various
>>>>>> bugs that led to NativeCallStack::NativeCallStack() and
>>>>>> os::get_native_stack() (and sometimes both) being in the callsite.
>>>>>> Reviewing the bugs I referred to will give you an idea of where to
>>>>>> look. One good place to look at
>>>>>> NativeCallStack::NativeCallStack(). Lots of special case code
>>>>>> there that controls how many frames to skip based on on the
>>>>>> platform and whether optimized or not. Also some comments there to
>>>>>> help you out. I did a lot of bug fixing in this method.
>>>>>>
>>>>>> Looking at this code also reminds me of a reason to have the test
>>>>>> continue to check for all 4 specific frames. If the frame skipping
>>>>>> code skips an extra frame, then the callsite will be missing a
>>>>>> needed frame at the top. The way the test was written it would
>>>>>> detect this. With your changes it will not. It would just revert
>>>>>> to always matching on 3 frames instead of 4, and the frame
>>>>>> skipping bug would go unnoticed.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>>
>>>>>>>> Now the test also has made inlining assumptions beyond what NMT
>>>>>>>> has made, and that is really what this bug is about. In general
>>>>>>>> I think your fix is fine in the way it relaxes which frames are
>>>>>>>> actually found, but as Thomas points out, it suffers from not
>>>>>>>> actually looking at a single stacktrace, but just looking for
>>>>>>>> the specified frames somewhere in the output (and in the order
>>>>>>>> specified.) You should probably address this.
>>>>>>>
>>>>>>> Right that was an error on my part. I thought the existing
>>>>>>> MULTILINE pattern matching with .* would also find non-sequential
>>>>>>> lines and so I was acting similarly. I will re-think this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> Given the changes you made to allow more flexibly in which
>>>>>>>>>>>> frames appear, I think you need to now also make sure the
>>>>>>>>>>>> above 3 mentioned frames are not present, except for
>>>>>>>>>>>> allowing AllocateHeap() in slowdebug builds.
>>>>>>>>>>>>
>>>>>>>>>>>> thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Chris
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/3/19 10:53 PM, David Holmes wrote:
>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218458
>>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8218458/webrev/
>>>>>>>>>>>>>
>>>>>>>>>>>>> The actual stack trace reported by NMT detail is affected
>>>>>>>>>>>>> by the inlining decisions of the native compiler, and on
>>>>>>>>>>>>> the type of build. So we define an "ideal" stacktrace and
>>>>>>>>>>>>> then allow for some frames to be missing based on empirical
>>>>>>>>>>>>> observations. So to date we have seen two frames that may
>>>>>>>>>>>>> or may not be inlined and so we allow for 2 non-matching
>>>>>>>>>>>>> entries.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The special-casing of AllocateHeap is removed as now it is
>>>>>>>>>>>>> just an optional frame.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chris: does this maintain the "spirit" of the test as you
>>>>>>>>>>>>> intended?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Zhengyu: can you test this on your system(s) please.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
More information about the hotspot-runtime-dev
mailing list