RFR (S): 8218458: [TESTBUG] runtime/NMT/CheckForProperDetailStackTrace.java fails with Expected stack trace missing from output

Fri Apr 5 17:09:47 UTC 2019

Hi David,

Why was the JVM_DefineModule frame left off of stackTraceAlternate?

Since you've added the following:

  103         if (!okToHaveAllocateHeap) {
  104             output.shouldNotContain("AllocateHeap");
  105         }

You can simplify the following:

  123         if (okToHaveAllocateHeap) {
  124             expectedStackTrace = stackTraceAllocateHeap;
  125             if (stackTraceMatches(expectedStackTrace, output)) {
  126                 return;
  127             }
  128         } else {

The is no need for the okToHaveAllocateHeap check here anymore. Just 
check all 3 allowed stacktraces until one passes. This is a slight 
improvement in flexibility in that it would no longer require the 
slowdebug builds to match stackTraceAllocateHeap. They could match any 
of the 3. You could then put all 3 allowed stacktraces in an array and 
check them in a loop if you wish.

The following is no longer correct:

  140         throw new RuntimeException("Expected stack trace missing 
from output: " + expectedStackTrace);

In your current approach, expectedStackTrace is just the last stacktrace 
we tried. Since we may try more than one, maybe all the ones that failed 
to match should be listed (or none listed if just too messy).

thanks,

Chris

On 4/5/19 12:04 AM, David Holmes wrote:
> Hi Chris,
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~dholmes/8218458/webrev.v2/
>
> Checks for alternate stack now. Added lots of comments and misc fixups.
>
> Zhengyu: please re-test (I can't test any slowdebug except linux-x64).
>
> Thanks,
> David
>
> On 5/04/2019 4:01 pm, Chris Plummer wrote:
>> Thinking about this a bit more, there is still the potential for some 
>> confusion if this test fails again in the future due to the top frame 
>> missing. Is it missing because it got inlined or is it missing 
>> because the frame skipping code skipped an extra frame? Hopefully 
>> whoever deals with it doesn't just hastily add another valid 
>> stacktrace to the test but instead investigates to make sure the 
>> issue is indeed that the method got inlined.
>>
>> Chris
>>
>> On 4/4/19 10:56 PM, David Holmes wrote:
>>> Hi Chris,
>>>
>>> Okay I will simply check for the third alternative.
>>>
>>> Thanks,
>>> David
>>>
>>> On 5/04/2019 3:53 pm, Chris Plummer wrote:
>>>> Hi David,
>>>>
>>>> For the callsite that this test is checking for, right now there 
>>>> appear to be 3 possible stacktraces: the "normal" one, the one that 
>>>> includes AllocateHeap() on solaris and windows slowdebug builds, 
>>>> and the one Zhengyu is now seeing on linux-x64. You would need to 
>>>> check for all 3, limiting the AllocateHeap() one to just being 
>>>> allowed on solaris and windows slowdebug as it is now. So basically 
>>>> this test needs to cover all (allowable) stacktraces that we've 
>>>> seen for this callsite, and be updated in the future as needed. Not 
>>>> ideal, but I don't see a better solution. It's similar to the 
>>>> situation described in JDK-8163899 which covered the fragility of 
>>>> the NMT frame skipping code. In the end it was decided it would be 
>>>> easier to just deal fix issues as they came up rather then engineer 
>>>> a solution that wasn't as fragile. I think this test falls in the 
>>>> same category.
>>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>
>>>> On 4/4/19 10:11 PM, David Holmes wrote:
>>>>> Hi Chris,
>>>>>
>>>>> Thanks for the explanation about the frame counting from 
>>>>> os::malloc - now I get it. But I don't understand your final comment:
>>>>>
>>>>> > Looking at this code also reminds me of a reason to have the test
>>>>> > continue to check for all 4 specific frames. If the frame 
>>>>> skipping code
>>>>> > skips an extra frame, then the callsite will be missing a needed 
>>>>> frame
>>>>> > at the top. The way the test was written it would detect this. 
>>>>> With your
>>>>> > changes it will not. It would just revert to always matching on 
>>>>> 3 frames
>>>>> > instead of 4, and the frame skipping bug would go unnoticed.
>>>>>
>>>>> How can I fix this bug if I have to check for 4 specific frames 
>>>>> but one (or more) may be missing - i.e how can I tell the 
>>>>> different between "Frame A was inlined" and "Frame A was skipped 
>>>>> by mistake" ??
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>> On 5/04/2019 2:17 pm, Chris Plummer wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 4/4/19 6:28 PM, David Holmes wrote:
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> On 5/04/2019 1:48 am, Chris Plummer wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> On 4/4/19 12:14 AM, David Holmes wrote:
>>>>>>>>> On 4/04/2019 4:35 pm, Chris Plummer wrote:
>>>>>>>>>> On 4/3/19 11:23 PM, David Holmes wrote:
>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>
>>>>>>>>>>> On 4/04/2019 4:12 pm, Chris Plummer wrote:
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>> I have concerns that this will hide some of the other bugs 
>>>>>>>>>>>> I've mentioned: JDK-8133749, JDK-8133747, and JDK-8133740. 
>>>>>>>>>>>> These bugs result in 1 or two frames appearing in the 
>>>>>>>>>>>> stacktrace that should be skipped. Notably 
>>>>>>>>>>>> NativeCallStack::NativeCallStack() and os::get_native_stack().
>>>>>>>>>>>
>>>>>>>>>>> The test still checks those are not present first:
>>>>>>>>>>>
>>>>>>>>>>> 73         // We should never see either of these frames 
>>>>>>>>>>> because they are supposed to be skipped. */
>>>>>>>>>>> 74 output.shouldNotContain("NativeCallStack::NativeCallStack");
>>>>>>>>>>> 75 output.shouldNotContain("os::get_native_stack");
>>>>>>>>>> Ah yes. I skimmed over the test looking for it but missed it.
>>>>>>>>>>>
>>>>>>>>>>>> Also, AllocateHeap() should normally not be in the stack 
>>>>>>>>>>>> trace, but the test has specifically allowed for it for 
>>>>>>>>>>>> windows and solaris slowdebug builds. Although these builds 
>>>>>>>>>>>> should have honored the ALWAYSINLINE directive, it was 
>>>>>>>>>>>> deemed acceptable that it was not in slowdebug builds. 
>>>>>>>>>>>> However, I would not want to allow AllocateHeap() to appear 
>>>>>>>>>>>> in a product build, and best not to see it in fastdebug 
>>>>>>>>>>>> either.
>>>>>>>>>>>
>>>>>>>>>>> This is a test of NMT detail not a test of whether a given 
>>>>>>>>>>> compiler chooses to inline something like AllocateHeap. I 
>>>>>>>>>>> don't think it is the job of this test to be checking for 
>>>>>>>>>>> something specific to the native compiler. The previous 
>>>>>>>>>>> handling of AllocateHeap seemed to be there simply because 
>>>>>>>>>>> it was the only way to deal with an optional frame - but now 
>>>>>>>>>>> that's handled generically.
>>>>>>>>>> It's appearance means you effectively only have 3 frames to 
>>>>>>>>>> identity callsites instead of 4. 
>>>>>>>>>
>>>>>>>>> Both stacktraces in the old test had 4 elements and expected 4 
>>>>>>>>> matches. The current bug is that one of those (new_entry) 
>>>>>>>>> could actually be inlined as well, resulting in only 3 
>>>>>>>>> matches. So that is what the revised test checks for: at least 
>>>>>>>>> 3 matches. Often there will be 4 matches.
>>>>>>>> I think you misunderstood my "3 frames" comment. I was 
>>>>>>>> referring to how many frames NMT uses to identify the callsite. 
>>>>>>>> It wants to use 4, but if AllocateHeap() doesn't get inlined, 
>>>>>>>> it effectively is using 3. The test should detect when this 
>>>>>>>> happens so the NMT implementation can address the issue.
>>>>>>>
>>>>>>> You're right I don't understand this part as I don't know 
>>>>>>> how/what NMT detail is doing in this regard.
>>>>>>
>>>>>> An NMT callsite is simply the 4 most recent frames (afters some 
>>>>>> pruning) that led to the os:malloc() call. "4" is somewhat 
>>>>>> arbitrary as Thomas pointed out, and is controlled by 
>>>>>> NMT_TrackingStackDepth. Making NMT_TrackingStackDepth bigger 
>>>>>> means more refinement of the callsites (thus more callsites), but 
>>>>>> a clearer picture of what actually led to the os:malloc().
>>>>>>
>>>>>> For example, with NMT_TrackingStackDepth == 4, if you have a() 
>>>>>> calls b() calls c() calls d() calls os:malloc(), and foo() and 
>>>>>> bar() both call a(), the NMT detail output will not distinguish 
>>>>>> between these two calls paths to os:mallco(), and will consider 
>>>>>> both paths to be the same callsite. The 4 frames in the NMT 
>>>>>> detail output would always be a, b, c, and d. However, bump up 
>>>>>> NMT_TrackingStackDepth to 5 and now NMT will treat them as two 
>>>>>> separate callsites, one with foo() as the bottom frame and one 
>>>>>> with bar() as the bottom frame, and both with a, b, c, and d as 
>>>>>> the other 4 frames.
>>>>>>
>>>>>> So my point is if AllocateHeap() is not inlined, then every 
>>>>>> allocation that is the result of doing a "new" of any CHeapObj 
>>>>>> subtype will have AllocateHeap() in its callsite, which 
>>>>>> effectively lowers they callsite refinement by 1.
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Hmmm but now I'm wondering why this trace:
>>>>>>>>>
>>>>>>>>>   50     public static String stackTraceAllocateHeap =
>>>>>>>>>   51         ".*AllocateHeap.*\n" +
>>>>>>>>>   52         ".*ModuleEntryTable.*new_entry.*\n" +
>>>>>>>>>
>>>>>>>>> doesn't include ".*Hashtable.*allocate_new_entry.*"? Was it 
>>>>>>>>> getting inlined already when AllocateHeap was not? Even so we 
>>>>>>>>> still end up with 4 frames matching normally.
>>>>>>>> I noticed that last night also and scratch my head over it for 
>>>>>>>> a while and then went to bed. The only explanation I could come 
>>>>>>>> up with is that allocate_new_entry() is getting inlined, and as 
>>>>>>>> a result (due to being a slowdebug build and doing minimal 
>>>>>>>> inlining) AllocateHeap() was not inlined.
>>>>>>>>>
>>>>>>>>>> If it does appear in a product build, a solution should be 
>>>>>>>>>> looked into to get rid of it. If the port owner decides it 
>>>>>>>>>> can't get rid of it (or is unwilling to), then an exception 
>>>>>>>>>> should be added to the test like was done for solaris and 
>>>>>>>>>> windows slowdebug builds.
>>>>>>>>>
>>>>>>>>> Are we specifically trying to test the compiler's ability to 
>>>>>>>>> inline that function and just happen to be using this test to 
>>>>>>>>> verify that? Doesn't seem like a suitable place to do this - 
>>>>>>>>> and why do we need to do it? The Visual Studio docs state:
>>>>>>>>>
>>>>>>>>> "You cannot force the compiler to inline a particular 
>>>>>>>>> function, even with the __forceinline keyword."
>>>>>>>>>
>>>>>>>>> so ALWAYSINLINE is just a hint even in product builds and 
>>>>>>>>> could change with any update to the compiler.
>>>>>>>>>
>>>>>>>>> For Solaris Studio it is again not guaranteed to inline - 
>>>>>>>>> specifically -xinline only has an effect at –xO3 or higher. 
>>>>>>>>> Which likely explains why it is ignored in slowdebug. And 
>>>>>>>>> there are other cases where it won't honour the ALWAYSINLINE.
>>>>>>>>>
>>>>>>>>> Even with gcc we seem to be misusing the attribute if we want 
>>>>>>>>> to ensure inlining when not optimising:
>>>>>>>>>
>>>>>>>>> "GCC does not inline any functions when not optimizing unless 
>>>>>>>>> you specify the ‘always_inline’ attribute for the function, 
>>>>>>>>> like this:
>>>>>>>>>
>>>>>>>>> /* Prototype.  */
>>>>>>>>> inline void foo (const char) __attribute__((always_inline));"
>>>>>>>>>
>>>>>>>>> and we don't write it that way.
>>>>>>>>>
>>>>>>>>> So if we're that concerned about release builds guaranteeing 
>>>>>>>>> to inline AllocateHeap then I think we need something a bit 
>>>>>>>>> more explicit than this test to determine that.
>>>>>>>> With respect to the 3 methods/functions we don't want to see in 
>>>>>>>> the callsite stacktrace, NMT has made a number of assumptions 
>>>>>>>> on inlining. One of the things the test is doing is making sure 
>>>>>>>> those assumptions are correct. If incorrect, then you run into 
>>>>>>>> issues like I mentioned above where callsite backtraces 
>>>>>>>> effectively only have 3 unique frames rather than 4 (actually 
>>>>>>>> before some bug fixes it was often just 2 unique frames). So I 
>>>>>>>> think it's appropriate to have a test to make sure we are not 
>>>>>>>> seeing any of these 3 methods/functions.
>>>>>>>
>>>>>>> Okay I get the gist of that. Is there somewhere I can clearly 
>>>>>>> see what this inlining assumptions are that NMT makes? Are they 
>>>>>>> clearly documented?
>>>>>>
>>>>>> Not that I know of. I discovered them while looking at the 
>>>>>> various bugs that led to NativeCallStack::NativeCallStack() and 
>>>>>> os::get_native_stack() (and sometimes both) being in the 
>>>>>> callsite. Reviewing the bugs I referred to will give you an idea 
>>>>>> of where to look. One good place to look at 
>>>>>> NativeCallStack::NativeCallStack(). Lots of special case code 
>>>>>> there that controls how many frames to skip based on on the 
>>>>>> platform and whether optimized or not. Also some comments there 
>>>>>> to help you out. I did a lot of bug fixing in this method.
>>>>>>
>>>>>> Looking at this code also reminds me of a reason to have the test 
>>>>>> continue to check for all 4 specific frames. If the frame 
>>>>>> skipping code skips an extra frame, then the callsite will be 
>>>>>> missing a needed frame at the top. The way the test was written 
>>>>>> it would detect this. With your changes it will not. It would 
>>>>>> just revert to always matching on 3 frames instead of 4, and the 
>>>>>> frame skipping bug would go unnoticed.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>>
>>>>>>>> Now the test also has made inlining assumptions beyond what NMT 
>>>>>>>> has made, and that is really what this bug is about. In general 
>>>>>>>> I think your fix is fine in the way it relaxes which frames are 
>>>>>>>> actually found, but as Thomas points out, it suffers from not 
>>>>>>>> actually looking at a single stacktrace, but just looking for 
>>>>>>>> the specified frames somewhere in the output (and in the order 
>>>>>>>> specified.) You should probably address this.
>>>>>>>
>>>>>>> Right that was an error on my part. I thought the existing 
>>>>>>> MULTILINE pattern matching with .* would also find 
>>>>>>> non-sequential lines and so I was acting similarly. I will 
>>>>>>> re-think this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> Given the changes you made to allow more flexibly in which 
>>>>>>>>>>>> frames appear, I think you need to now also make sure the 
>>>>>>>>>>>> above 3 mentioned frames are not present, except for 
>>>>>>>>>>>> allowing AllocateHeap() in slowdebug builds.
>>>>>>>>>>>>
>>>>>>>>>>>> thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Chris
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/3/19 10:53 PM, David Holmes wrote:
>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218458
>>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~dholmes/8218458/webrev/
>>>>>>>>>>>>>
>>>>>>>>>>>>> The actual stack trace reported by NMT detail is affected 
>>>>>>>>>>>>> by the inlining decisions of the native compiler, and on 
>>>>>>>>>>>>> the type of build. So we define an "ideal" stacktrace and 
>>>>>>>>>>>>> then allow for some frames to be missing based on 
>>>>>>>>>>>>> empirical observations. So to date we have seen two frames 
>>>>>>>>>>>>> that may or may not be inlined and so we allow for 2 
>>>>>>>>>>>>> non-matching entries.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The special-casing of AllocateHeap is removed as now it is 
>>>>>>>>>>>>> just an optional frame.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chris: does this maintain the "spirit" of the test as you 
>>>>>>>>>>>>> intended?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Zhengyu: can you test this on your system(s) please.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>