RFR (S): 8218458: [TESTBUG] runtime/NMT/CheckForProperDetailStackTrace.java fails with Expected stack trace missing from output

Fri Apr 5 04:17:29 UTC 2019

Hi David,

On 4/4/19 6:28 PM, David Holmes wrote:
> Hi Chris,
>
> On 5/04/2019 1:48 am, Chris Plummer wrote:
>> Hi David,
>>
>> On 4/4/19 12:14 AM, David Holmes wrote:
>>> On 4/04/2019 4:35 pm, Chris Plummer wrote:
>>>> On 4/3/19 11:23 PM, David Holmes wrote:
>>>>> Hi Chris,
>>>>>
>>>>> On 4/04/2019 4:12 pm, Chris Plummer wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> I have concerns that this will hide some of the other bugs I've 
>>>>>> mentioned: JDK-8133749, JDK-8133747, and JDK-8133740. These bugs 
>>>>>> result in 1 or two frames appearing in the stacktrace that should 
>>>>>> be skipped. Notably NativeCallStack::NativeCallStack() and 
>>>>>> os::get_native_stack().
>>>>>
>>>>> The test still checks those are not present first:
>>>>>
>>>>> 73         // We should never see either of these frames because 
>>>>> they are supposed to be skipped. */
>>>>> 74 output.shouldNotContain("NativeCallStack::NativeCallStack");
>>>>> 75 output.shouldNotContain("os::get_native_stack");
>>>> Ah yes. I skimmed over the test looking for it but missed it.
>>>>>
>>>>>> Also, AllocateHeap() should normally not be in the stack trace, 
>>>>>> but the test has specifically allowed for it for windows and 
>>>>>> solaris slowdebug builds. Although these builds should have 
>>>>>> honored the ALWAYSINLINE directive, it was deemed acceptable that 
>>>>>> it was not in slowdebug builds. However, I would not want to 
>>>>>> allow AllocateHeap() to appear in a product build, and best not 
>>>>>> to see it in fastdebug either.
>>>>>
>>>>> This is a test of NMT detail not a test of whether a given 
>>>>> compiler chooses to inline something like AllocateHeap. I don't 
>>>>> think it is the job of this test to be checking for something 
>>>>> specific to the native compiler. The previous handling of 
>>>>> AllocateHeap seemed to be there simply because it was the only way 
>>>>> to deal with an optional frame - but now that's handled generically.
>>>> It's appearance means you effectively only have 3 frames to 
>>>> identity callsites instead of 4. 
>>>
>>> Both stacktraces in the old test had 4 elements and expected 4 
>>> matches. The current bug is that one of those (new_entry) could 
>>> actually be inlined as well, resulting in only 3 matches. So that is 
>>> what the revised test checks for: at least 3 matches. Often there 
>>> will be 4 matches.
>> I think you misunderstood my "3 frames" comment. I was referring to 
>> how many frames NMT uses to identify the callsite. It wants to use 4, 
>> but if AllocateHeap() doesn't get inlined, it effectively is using 3. 
>> The test should detect when this happens so the NMT implementation 
>> can address the issue.
>
> You're right I don't understand this part as I don't know how/what NMT 
> detail is doing in this regard.

An NMT callsite is simply the 4 most recent frames (afters some pruning) 
that led to the os:malloc() call. "4" is somewhat arbitrary as Thomas 
pointed out, and is controlled by NMT_TrackingStackDepth. Making 
NMT_TrackingStackDepth bigger means more refinement of the callsites 
(thus more callsites), but a clearer picture of what actually led to the 
os:malloc().

For example, with NMT_TrackingStackDepth == 4, if you have a() calls b() 
calls c() calls d() calls os:malloc(), and foo() and bar() both call 
a(), the NMT detail output will not distinguish between these two calls 
paths to os:mallco(), and will consider both paths to be the same 
callsite. The 4 frames in the NMT detail output would always be a, b, c, 
and d. However, bump up NMT_TrackingStackDepth to 5 and now NMT will 
treat them as two separate callsites, one with foo() as the bottom frame 
and one with bar() as the bottom frame, and both with a, b, c, and d as 
the other 4 frames.

So my point is if AllocateHeap() is not inlined, then every allocation 
that is the result of doing a "new" of any CHeapObj subtype will have 
AllocateHeap() in its callsite, which effectively lowers they callsite 
refinement by 1.

>
>>>
>>> Hmmm but now I'm wondering why this trace:
>>>
>>>   50     public static String stackTraceAllocateHeap =
>>>   51         ".*AllocateHeap.*\n" +
>>>   52         ".*ModuleEntryTable.*new_entry.*\n" +
>>>
>>> doesn't include ".*Hashtable.*allocate_new_entry.*"? Was it getting 
>>> inlined already when AllocateHeap was not? Even so we still end up 
>>> with 4 frames matching normally.
>> I noticed that last night also and scratch my head over it for a 
>> while and then went to bed. The only explanation I could come up with 
>> is that allocate_new_entry() is getting inlined, and as a result (due 
>> to being a slowdebug build and doing minimal inlining) AllocateHeap() 
>> was not inlined.
>>>
>>>> If it does appear in a product build, a solution should be looked 
>>>> into to get rid of it. If the port owner decides it can't get rid 
>>>> of it (or is unwilling to), then an exception should be added to 
>>>> the test like was done for solaris and windows slowdebug builds.
>>>
>>> Are we specifically trying to test the compiler's ability to inline 
>>> that function and just happen to be using this test to verify that? 
>>> Doesn't seem like a suitable place to do this - and why do we need 
>>> to do it? The Visual Studio docs state:
>>>
>>> "You cannot force the compiler to inline a particular function, even 
>>> with the __forceinline keyword."
>>>
>>> so ALWAYSINLINE is just a hint even in product builds and could 
>>> change with any update to the compiler.
>>>
>>> For Solaris Studio it is again not guaranteed to inline - 
>>> specifically -xinline only has an effect at –xO3 or higher. Which 
>>> likely explains why it is ignored in slowdebug. And there are other 
>>> cases where it won't honour the ALWAYSINLINE.
>>>
>>> Even with gcc we seem to be misusing the attribute if we want to 
>>> ensure inlining when not optimising:
>>>
>>> "GCC does not inline any functions when not optimizing unless you 
>>> specify the ‘always_inline’ attribute for the function, like this:
>>>
>>> /* Prototype.  */
>>> inline void foo (const char) __attribute__((always_inline));"
>>>
>>> and we don't write it that way.
>>>
>>> So if we're that concerned about release builds guaranteeing to 
>>> inline AllocateHeap then I think we need something a bit more 
>>> explicit than this test to determine that.
>> With respect to the 3 methods/functions we don't want to see in the 
>> callsite stacktrace, NMT has made a number of assumptions on 
>> inlining. One of the things the test is doing is making sure those 
>> assumptions are correct. If incorrect, then you run into issues like 
>> I mentioned above where callsite backtraces effectively only have 3 
>> unique frames rather than 4 (actually before some bug fixes it was 
>> often just 2 unique frames). So I think it's appropriate to have a 
>> test to make sure we are not seeing any of these 3 methods/functions.
>
> Okay I get the gist of that. Is there somewhere I can clearly see what 
> this inlining assumptions are that NMT makes? Are they clearly 
> documented?

Not that I know of. I discovered them while looking at the various bugs 
that led to NativeCallStack::NativeCallStack() and 
os::get_native_stack() (and sometimes both) being in the callsite. 
Reviewing the bugs I referred to will give you an idea of where to look. 
One good place to look at NativeCallStack::NativeCallStack(). Lots of 
special case code there that controls how many frames to skip based on 
on the platform and whether optimized or not. Also some comments there 
to help you out. I did a lot of bug fixing in this method.

Looking at this code also reminds me of a reason to have the test 
continue to check for all 4 specific frames. If the frame skipping code 
skips an extra frame, then the callsite will be missing a needed frame 
at the top. The way the test was written it would detect this. With your 
changes it will not. It would just revert to always matching on 3 frames 
instead of 4, and the frame skipping bug would go unnoticed.

thanks,

Chris

>
>> Now the test also has made inlining assumptions beyond what NMT has 
>> made, and that is really what this bug is about. In general I think 
>> your fix is fine in the way it relaxes which frames are actually 
>> found, but as Thomas points out, it suffers from not actually looking 
>> at a single stacktrace, but just looking for the specified frames 
>> somewhere in the output (and in the order specified.) You should 
>> probably address this.
>
> Right that was an error on my part. I thought the existing MULTILINE 
> pattern matching with .* would also find non-sequential lines and so I 
> was acting similarly. I will re-think this.
>
> Thanks,
> David
>
>> thanks,
>>
>> Chris
>>>
>>> Thanks,
>>> David
>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Given the changes you made to allow more flexibly in which frames 
>>>>>> appear, I think you need to now also make sure the above 3 
>>>>>> mentioned frames are not present, except for allowing 
>>>>>> AllocateHeap() in slowdebug builds.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 4/3/19 10:53 PM, David Holmes wrote:
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218458
>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8218458/webrev/
>>>>>>>
>>>>>>> The actual stack trace reported by NMT detail is affected by the 
>>>>>>> inlining decisions of the native compiler, and on the type of 
>>>>>>> build. So we define an "ideal" stacktrace and then allow for 
>>>>>>> some frames to be missing based on empirical observations. So to 
>>>>>>> date we have seen two frames that may or may not be inlined and 
>>>>>>> so we allow for 2 non-matching entries.
>>>>>>>
>>>>>>> The special-casing of AllocateHeap is removed as now it is just 
>>>>>>> an optional frame.
>>>>>>>
>>>>>>> Chris: does this maintain the "spirit" of the test as you intended?
>>>>>>>
>>>>>>> Zhengyu: can you test this on your system(s) please.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>