RFR (S): 8218458: [TESTBUG] runtime/NMT/CheckForProperDetailStackTrace.java fails with Expected stack trace missing from output

David Holmes david.holmes at oracle.com
Fri Apr 5 05:11:19 UTC 2019


Hi Chris,

Thanks for the explanation about the frame counting from os::malloc - 
now I get it. But I don't understand your final comment:

 > Looking at this code also reminds me of a reason to have the test
 > continue to check for all 4 specific frames. If the frame skipping code
 > skips an extra frame, then the callsite will be missing a needed frame
 > at the top. The way the test was written it would detect this. With your
 > changes it will not. It would just revert to always matching on 3 frames
 > instead of 4, and the frame skipping bug would go unnoticed.

How can I fix this bug if I have to check for 4 specific frames but one 
(or more) may be missing - i.e how can I tell the different between 
"Frame A was inlined" and "Frame A was skipped by mistake" ??

Thanks,
David


On 5/04/2019 2:17 pm, Chris Plummer wrote:
> Hi David,
> 
> On 4/4/19 6:28 PM, David Holmes wrote:
>> Hi Chris,
>>
>> On 5/04/2019 1:48 am, Chris Plummer wrote:
>>> Hi David,
>>>
>>> On 4/4/19 12:14 AM, David Holmes wrote:
>>>> On 4/04/2019 4:35 pm, Chris Plummer wrote:
>>>>> On 4/3/19 11:23 PM, David Holmes wrote:
>>>>>> Hi Chris,
>>>>>>
>>>>>> On 4/04/2019 4:12 pm, Chris Plummer wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> I have concerns that this will hide some of the other bugs I've 
>>>>>>> mentioned: JDK-8133749, JDK-8133747, and JDK-8133740. These bugs 
>>>>>>> result in 1 or two frames appearing in the stacktrace that should 
>>>>>>> be skipped. Notably NativeCallStack::NativeCallStack() and 
>>>>>>> os::get_native_stack().
>>>>>>
>>>>>> The test still checks those are not present first:
>>>>>>
>>>>>> 73         // We should never see either of these frames because 
>>>>>> they are supposed to be skipped. */
>>>>>> 74 output.shouldNotContain("NativeCallStack::NativeCallStack");
>>>>>> 75 output.shouldNotContain("os::get_native_stack");
>>>>> Ah yes. I skimmed over the test looking for it but missed it.
>>>>>>
>>>>>>> Also, AllocateHeap() should normally not be in the stack trace, 
>>>>>>> but the test has specifically allowed for it for windows and 
>>>>>>> solaris slowdebug builds. Although these builds should have 
>>>>>>> honored the ALWAYSINLINE directive, it was deemed acceptable that 
>>>>>>> it was not in slowdebug builds. However, I would not want to 
>>>>>>> allow AllocateHeap() to appear in a product build, and best not 
>>>>>>> to see it in fastdebug either.
>>>>>>
>>>>>> This is a test of NMT detail not a test of whether a given 
>>>>>> compiler chooses to inline something like AllocateHeap. I don't 
>>>>>> think it is the job of this test to be checking for something 
>>>>>> specific to the native compiler. The previous handling of 
>>>>>> AllocateHeap seemed to be there simply because it was the only way 
>>>>>> to deal with an optional frame - but now that's handled generically.
>>>>> It's appearance means you effectively only have 3 frames to 
>>>>> identity callsites instead of 4. 
>>>>
>>>> Both stacktraces in the old test had 4 elements and expected 4 
>>>> matches. The current bug is that one of those (new_entry) could 
>>>> actually be inlined as well, resulting in only 3 matches. So that is 
>>>> what the revised test checks for: at least 3 matches. Often there 
>>>> will be 4 matches.
>>> I think you misunderstood my "3 frames" comment. I was referring to 
>>> how many frames NMT uses to identify the callsite. It wants to use 4, 
>>> but if AllocateHeap() doesn't get inlined, it effectively is using 3. 
>>> The test should detect when this happens so the NMT implementation 
>>> can address the issue.
>>
>> You're right I don't understand this part as I don't know how/what NMT 
>> detail is doing in this regard.
> 
> An NMT callsite is simply the 4 most recent frames (afters some pruning) 
> that led to the os:malloc() call. "4" is somewhat arbitrary as Thomas 
> pointed out, and is controlled by NMT_TrackingStackDepth. Making 
> NMT_TrackingStackDepth bigger means more refinement of the callsites 
> (thus more callsites), but a clearer picture of what actually led to the 
> os:malloc().
> 
> For example, with NMT_TrackingStackDepth == 4, if you have a() calls b() 
> calls c() calls d() calls os:malloc(), and foo() and bar() both call 
> a(), the NMT detail output will not distinguish between these two calls 
> paths to os:mallco(), and will consider both paths to be the same 
> callsite. The 4 frames in the NMT detail output would always be a, b, c, 
> and d. However, bump up NMT_TrackingStackDepth to 5 and now NMT will 
> treat them as two separate callsites, one with foo() as the bottom frame 
> and one with bar() as the bottom frame, and both with a, b, c, and d as 
> the other 4 frames.
> 
> So my point is if AllocateHeap() is not inlined, then every allocation 
> that is the result of doing a "new" of any CHeapObj subtype will have 
> AllocateHeap() in its callsite, which effectively lowers they callsite 
> refinement by 1.
> 
>>
>>>>
>>>> Hmmm but now I'm wondering why this trace:
>>>>
>>>>   50     public static String stackTraceAllocateHeap =
>>>>   51         ".*AllocateHeap.*\n" +
>>>>   52         ".*ModuleEntryTable.*new_entry.*\n" +
>>>>
>>>> doesn't include ".*Hashtable.*allocate_new_entry.*"? Was it getting 
>>>> inlined already when AllocateHeap was not? Even so we still end up 
>>>> with 4 frames matching normally.
>>> I noticed that last night also and scratch my head over it for a 
>>> while and then went to bed. The only explanation I could come up with 
>>> is that allocate_new_entry() is getting inlined, and as a result (due 
>>> to being a slowdebug build and doing minimal inlining) AllocateHeap() 
>>> was not inlined.
>>>>
>>>>> If it does appear in a product build, a solution should be looked 
>>>>> into to get rid of it. If the port owner decides it can't get rid 
>>>>> of it (or is unwilling to), then an exception should be added to 
>>>>> the test like was done for solaris and windows slowdebug builds.
>>>>
>>>> Are we specifically trying to test the compiler's ability to inline 
>>>> that function and just happen to be using this test to verify that? 
>>>> Doesn't seem like a suitable place to do this - and why do we need 
>>>> to do it? The Visual Studio docs state:
>>>>
>>>> "You cannot force the compiler to inline a particular function, even 
>>>> with the __forceinline keyword."
>>>>
>>>> so ALWAYSINLINE is just a hint even in product builds and could 
>>>> change with any update to the compiler.
>>>>
>>>> For Solaris Studio it is again not guaranteed to inline - 
>>>> specifically -xinline only has an effect at –xO3 or higher. Which 
>>>> likely explains why it is ignored in slowdebug. And there are other 
>>>> cases where it won't honour the ALWAYSINLINE.
>>>>
>>>> Even with gcc we seem to be misusing the attribute if we want to 
>>>> ensure inlining when not optimising:
>>>>
>>>> "GCC does not inline any functions when not optimizing unless you 
>>>> specify the ‘always_inline’ attribute for the function, like this:
>>>>
>>>> /* Prototype.  */
>>>> inline void foo (const char) __attribute__((always_inline));"
>>>>
>>>> and we don't write it that way.
>>>>
>>>> So if we're that concerned about release builds guaranteeing to 
>>>> inline AllocateHeap then I think we need something a bit more 
>>>> explicit than this test to determine that.
>>> With respect to the 3 methods/functions we don't want to see in the 
>>> callsite stacktrace, NMT has made a number of assumptions on 
>>> inlining. One of the things the test is doing is making sure those 
>>> assumptions are correct. If incorrect, then you run into issues like 
>>> I mentioned above where callsite backtraces effectively only have 3 
>>> unique frames rather than 4 (actually before some bug fixes it was 
>>> often just 2 unique frames). So I think it's appropriate to have a 
>>> test to make sure we are not seeing any of these 3 methods/functions.
>>
>> Okay I get the gist of that. Is there somewhere I can clearly see what 
>> this inlining assumptions are that NMT makes? Are they clearly 
>> documented?
> 
> Not that I know of. I discovered them while looking at the various bugs 
> that led to NativeCallStack::NativeCallStack() and 
> os::get_native_stack() (and sometimes both) being in the callsite. 
> Reviewing the bugs I referred to will give you an idea of where to look. 
> One good place to look at NativeCallStack::NativeCallStack(). Lots of 
> special case code there that controls how many frames to skip based on 
> on the platform and whether optimized or not. Also some comments there 
> to help you out. I did a lot of bug fixing in this method.
> 
> Looking at this code also reminds me of a reason to have the test 
> continue to check for all 4 specific frames. If the frame skipping code 
> skips an extra frame, then the callsite will be missing a needed frame 
> at the top. The way the test was written it would detect this. With your 
> changes it will not. It would just revert to always matching on 3 frames 
> instead of 4, and the frame skipping bug would go unnoticed.
> 
> thanks,
> 
> Chris
> 
>>
>>> Now the test also has made inlining assumptions beyond what NMT has 
>>> made, and that is really what this bug is about. In general I think 
>>> your fix is fine in the way it relaxes which frames are actually 
>>> found, but as Thomas points out, it suffers from not actually looking 
>>> at a single stacktrace, but just looking for the specified frames 
>>> somewhere in the output (and in the order specified.) You should 
>>> probably address this.
>>
>> Right that was an error on my part. I thought the existing MULTILINE 
>> pattern matching with .* would also find non-sequential lines and so I 
>> was acting similarly. I will re-think this.
>>
>> Thanks,
>> David
>>
>>> thanks,
>>>
>>> Chris
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> thanks,
>>>>>
>>>>> Chris
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Given the changes you made to allow more flexibly in which frames 
>>>>>>> appear, I think you need to now also make sure the above 3 
>>>>>>> mentioned frames are not present, except for allowing 
>>>>>>> AllocateHeap() in slowdebug builds.
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On 4/3/19 10:53 PM, David Holmes wrote:
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218458
>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8218458/webrev/
>>>>>>>>
>>>>>>>> The actual stack trace reported by NMT detail is affected by the 
>>>>>>>> inlining decisions of the native compiler, and on the type of 
>>>>>>>> build. So we define an "ideal" stacktrace and then allow for 
>>>>>>>> some frames to be missing based on empirical observations. So to 
>>>>>>>> date we have seen two frames that may or may not be inlined and 
>>>>>>>> so we allow for 2 non-matching entries.
>>>>>>>>
>>>>>>>> The special-casing of AllocateHeap is removed as now it is just 
>>>>>>>> an optional frame.
>>>>>>>>
>>>>>>>> Chris: does this maintain the "spirit" of the test as you intended?
>>>>>>>>
>>>>>>>> Zhengyu: can you test this on your system(s) please.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
> 
> 


More information about the hotspot-runtime-dev mailing list