RFR (S): 8218458: [TESTBUG] runtime/NMT/CheckForProperDetailStackTrace.java fails with Expected stack trace missing from output

David Holmes david.holmes at oracle.com
Fri Apr 5 05:56:50 UTC 2019


Hi Chris,

Okay I will simply check for the third alternative.

Thanks,
David

On 5/04/2019 3:53 pm, Chris Plummer wrote:
> Hi David,
> 
> For the callsite that this test is checking for, right now there appear 
> to be 3 possible stacktraces: the "normal" one, the one that includes 
> AllocateHeap() on solaris and windows slowdebug builds, and the one 
> Zhengyu is now seeing on linux-x64. You would need to check for all 3, 
> limiting the AllocateHeap() one to just being allowed on solaris and 
> windows slowdebug as it is now. So basically this test needs to cover 
> all (allowable) stacktraces that we've seen for this callsite, and be 
> updated in the future as needed. Not ideal, but I don't see a better 
> solution. It's similar to the situation described in JDK-8163899 which 
> covered the fragility of the NMT frame skipping code. In the end it was 
> decided it would be easier to just deal fix issues as they came up 
> rather then engineer a solution that wasn't as fragile. I think this 
> test falls in the same category.
> 
> thanks,
> 
> Chris
> 
> On 4/4/19 10:11 PM, David Holmes wrote:
>> Hi Chris,
>>
>> Thanks for the explanation about the frame counting from os::malloc - 
>> now I get it. But I don't understand your final comment:
>>
>> > Looking at this code also reminds me of a reason to have the test
>> > continue to check for all 4 specific frames. If the frame skipping code
>> > skips an extra frame, then the callsite will be missing a needed frame
>> > at the top. The way the test was written it would detect this. With 
>> your
>> > changes it will not. It would just revert to always matching on 3 
>> frames
>> > instead of 4, and the frame skipping bug would go unnoticed.
>>
>> How can I fix this bug if I have to check for 4 specific frames but 
>> one (or more) may be missing - i.e how can I tell the different 
>> between "Frame A was inlined" and "Frame A was skipped by mistake" ??
>>
>> Thanks,
>> David
>>
>>
>> On 5/04/2019 2:17 pm, Chris Plummer wrote:
>>> Hi David,
>>>
>>> On 4/4/19 6:28 PM, David Holmes wrote:
>>>> Hi Chris,
>>>>
>>>> On 5/04/2019 1:48 am, Chris Plummer wrote:
>>>>> Hi David,
>>>>>
>>>>> On 4/4/19 12:14 AM, David Holmes wrote:
>>>>>> On 4/04/2019 4:35 pm, Chris Plummer wrote:
>>>>>>> On 4/3/19 11:23 PM, David Holmes wrote:
>>>>>>>> Hi Chris,
>>>>>>>>
>>>>>>>> On 4/04/2019 4:12 pm, Chris Plummer wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> I have concerns that this will hide some of the other bugs I've 
>>>>>>>>> mentioned: JDK-8133749, JDK-8133747, and JDK-8133740. These 
>>>>>>>>> bugs result in 1 or two frames appearing in the stacktrace that 
>>>>>>>>> should be skipped. Notably NativeCallStack::NativeCallStack() 
>>>>>>>>> and os::get_native_stack().
>>>>>>>>
>>>>>>>> The test still checks those are not present first:
>>>>>>>>
>>>>>>>> 73         // We should never see either of these frames because 
>>>>>>>> they are supposed to be skipped. */
>>>>>>>> 74 output.shouldNotContain("NativeCallStack::NativeCallStack");
>>>>>>>> 75 output.shouldNotContain("os::get_native_stack");
>>>>>>> Ah yes. I skimmed over the test looking for it but missed it.
>>>>>>>>
>>>>>>>>> Also, AllocateHeap() should normally not be in the stack trace, 
>>>>>>>>> but the test has specifically allowed for it for windows and 
>>>>>>>>> solaris slowdebug builds. Although these builds should have 
>>>>>>>>> honored the ALWAYSINLINE directive, it was deemed acceptable 
>>>>>>>>> that it was not in slowdebug builds. However, I would not want 
>>>>>>>>> to allow AllocateHeap() to appear in a product build, and best 
>>>>>>>>> not to see it in fastdebug either.
>>>>>>>>
>>>>>>>> This is a test of NMT detail not a test of whether a given 
>>>>>>>> compiler chooses to inline something like AllocateHeap. I don't 
>>>>>>>> think it is the job of this test to be checking for something 
>>>>>>>> specific to the native compiler. The previous handling of 
>>>>>>>> AllocateHeap seemed to be there simply because it was the only 
>>>>>>>> way to deal with an optional frame - but now that's handled 
>>>>>>>> generically.
>>>>>>> It's appearance means you effectively only have 3 frames to 
>>>>>>> identity callsites instead of 4. 
>>>>>>
>>>>>> Both stacktraces in the old test had 4 elements and expected 4 
>>>>>> matches. The current bug is that one of those (new_entry) could 
>>>>>> actually be inlined as well, resulting in only 3 matches. So that 
>>>>>> is what the revised test checks for: at least 3 matches. Often 
>>>>>> there will be 4 matches.
>>>>> I think you misunderstood my "3 frames" comment. I was referring to 
>>>>> how many frames NMT uses to identify the callsite. It wants to use 
>>>>> 4, but if AllocateHeap() doesn't get inlined, it effectively is 
>>>>> using 3. The test should detect when this happens so the NMT 
>>>>> implementation can address the issue.
>>>>
>>>> You're right I don't understand this part as I don't know how/what 
>>>> NMT detail is doing in this regard.
>>>
>>> An NMT callsite is simply the 4 most recent frames (afters some 
>>> pruning) that led to the os:malloc() call. "4" is somewhat arbitrary 
>>> as Thomas pointed out, and is controlled by NMT_TrackingStackDepth. 
>>> Making NMT_TrackingStackDepth bigger means more refinement of the 
>>> callsites (thus more callsites), but a clearer picture of what 
>>> actually led to the os:malloc().
>>>
>>> For example, with NMT_TrackingStackDepth == 4, if you have a() calls 
>>> b() calls c() calls d() calls os:malloc(), and foo() and bar() both 
>>> call a(), the NMT detail output will not distinguish between these 
>>> two calls paths to os:mallco(), and will consider both paths to be 
>>> the same callsite. The 4 frames in the NMT detail output would always 
>>> be a, b, c, and d. However, bump up NMT_TrackingStackDepth to 5 and 
>>> now NMT will treat them as two separate callsites, one with foo() as 
>>> the bottom frame and one with bar() as the bottom frame, and both 
>>> with a, b, c, and d as the other 4 frames.
>>>
>>> So my point is if AllocateHeap() is not inlined, then every 
>>> allocation that is the result of doing a "new" of any CHeapObj 
>>> subtype will have AllocateHeap() in its callsite, which effectively 
>>> lowers they callsite refinement by 1.
>>>
>>>>
>>>>>>
>>>>>> Hmmm but now I'm wondering why this trace:
>>>>>>
>>>>>>   50     public static String stackTraceAllocateHeap =
>>>>>>   51         ".*AllocateHeap.*\n" +
>>>>>>   52         ".*ModuleEntryTable.*new_entry.*\n" +
>>>>>>
>>>>>> doesn't include ".*Hashtable.*allocate_new_entry.*"? Was it 
>>>>>> getting inlined already when AllocateHeap was not? Even so we 
>>>>>> still end up with 4 frames matching normally.
>>>>> I noticed that last night also and scratch my head over it for a 
>>>>> while and then went to bed. The only explanation I could come up 
>>>>> with is that allocate_new_entry() is getting inlined, and as a 
>>>>> result (due to being a slowdebug build and doing minimal inlining) 
>>>>> AllocateHeap() was not inlined.
>>>>>>
>>>>>>> If it does appear in a product build, a solution should be looked 
>>>>>>> into to get rid of it. If the port owner decides it can't get rid 
>>>>>>> of it (or is unwilling to), then an exception should be added to 
>>>>>>> the test like was done for solaris and windows slowdebug builds.
>>>>>>
>>>>>> Are we specifically trying to test the compiler's ability to 
>>>>>> inline that function and just happen to be using this test to 
>>>>>> verify that? Doesn't seem like a suitable place to do this - and 
>>>>>> why do we need to do it? The Visual Studio docs state:
>>>>>>
>>>>>> "You cannot force the compiler to inline a particular function, 
>>>>>> even with the __forceinline keyword."
>>>>>>
>>>>>> so ALWAYSINLINE is just a hint even in product builds and could 
>>>>>> change with any update to the compiler.
>>>>>>
>>>>>> For Solaris Studio it is again not guaranteed to inline - 
>>>>>> specifically -xinline only has an effect at –xO3 or higher. Which 
>>>>>> likely explains why it is ignored in slowdebug. And there are 
>>>>>> other cases where it won't honour the ALWAYSINLINE.
>>>>>>
>>>>>> Even with gcc we seem to be misusing the attribute if we want to 
>>>>>> ensure inlining when not optimising:
>>>>>>
>>>>>> "GCC does not inline any functions when not optimizing unless you 
>>>>>> specify the ‘always_inline’ attribute for the function, like this:
>>>>>>
>>>>>> /* Prototype.  */
>>>>>> inline void foo (const char) __attribute__((always_inline));"
>>>>>>
>>>>>> and we don't write it that way.
>>>>>>
>>>>>> So if we're that concerned about release builds guaranteeing to 
>>>>>> inline AllocateHeap then I think we need something a bit more 
>>>>>> explicit than this test to determine that.
>>>>> With respect to the 3 methods/functions we don't want to see in the 
>>>>> callsite stacktrace, NMT has made a number of assumptions on 
>>>>> inlining. One of the things the test is doing is making sure those 
>>>>> assumptions are correct. If incorrect, then you run into issues 
>>>>> like I mentioned above where callsite backtraces effectively only 
>>>>> have 3 unique frames rather than 4 (actually before some bug fixes 
>>>>> it was often just 2 unique frames). So I think it's appropriate to 
>>>>> have a test to make sure we are not seeing any of these 3 
>>>>> methods/functions.
>>>>
>>>> Okay I get the gist of that. Is there somewhere I can clearly see 
>>>> what this inlining assumptions are that NMT makes? Are they clearly 
>>>> documented?
>>>
>>> Not that I know of. I discovered them while looking at the various 
>>> bugs that led to NativeCallStack::NativeCallStack() and 
>>> os::get_native_stack() (and sometimes both) being in the callsite. 
>>> Reviewing the bugs I referred to will give you an idea of where to 
>>> look. One good place to look at NativeCallStack::NativeCallStack(). 
>>> Lots of special case code there that controls how many frames to skip 
>>> based on on the platform and whether optimized or not. Also some 
>>> comments there to help you out. I did a lot of bug fixing in this 
>>> method.
>>>
>>> Looking at this code also reminds me of a reason to have the test 
>>> continue to check for all 4 specific frames. If the frame skipping 
>>> code skips an extra frame, then the callsite will be missing a needed 
>>> frame at the top. The way the test was written it would detect this. 
>>> With your changes it will not. It would just revert to always 
>>> matching on 3 frames instead of 4, and the frame skipping bug would 
>>> go unnoticed.
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>>>
>>>>> Now the test also has made inlining assumptions beyond what NMT has 
>>>>> made, and that is really what this bug is about. In general I think 
>>>>> your fix is fine in the way it relaxes which frames are actually 
>>>>> found, but as Thomas points out, it suffers from not actually 
>>>>> looking at a single stacktrace, but just looking for the specified 
>>>>> frames somewhere in the output (and in the order specified.) You 
>>>>> should probably address this.
>>>>
>>>> Right that was an error on my part. I thought the existing MULTILINE 
>>>> pattern matching with .* would also find non-sequential lines and so 
>>>> I was acting similarly. I will re-think this.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> thanks,
>>>>>
>>>>> Chris
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Given the changes you made to allow more flexibly in which 
>>>>>>>>> frames appear, I think you need to now also make sure the above 
>>>>>>>>> 3 mentioned frames are not present, except for allowing 
>>>>>>>>> AllocateHeap() in slowdebug builds.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> On 4/3/19 10:53 PM, David Holmes wrote:
>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218458
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8218458/webrev/
>>>>>>>>>>
>>>>>>>>>> The actual stack trace reported by NMT detail is affected by 
>>>>>>>>>> the inlining decisions of the native compiler, and on the type 
>>>>>>>>>> of build. So we define an "ideal" stacktrace and then allow 
>>>>>>>>>> for some frames to be missing based on empirical observations. 
>>>>>>>>>> So to date we have seen two frames that may or may not be 
>>>>>>>>>> inlined and so we allow for 2 non-matching entries.
>>>>>>>>>>
>>>>>>>>>> The special-casing of AllocateHeap is removed as now it is 
>>>>>>>>>> just an optional frame.
>>>>>>>>>>
>>>>>>>>>> Chris: does this maintain the "spirit" of the test as you 
>>>>>>>>>> intended?
>>>>>>>>>>
>>>>>>>>>> Zhengyu: can you test this on your system(s) please.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
> 
> 


More information about the hotspot-runtime-dev mailing list