RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci"
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Jun 24 16:07:53 UTC 2020
On 6/23/20 11:04 PM, Chris Plummer wrote:
> On 6/23/20 7:07 PM, Daniel D. Daugherty wrote:
>> Just one more comment on this part:
>>
>>> > L220: System.out.println("CurrentFrameGuess: choosing
>>> interpreter frame: sp = " +
>>> > L221: spFound + ", fpFound = "
>>> + fp + ", pcFound = " + pc);
>>> > This debug output doesn't make sense to me:
>>> >
>>> > "sp = " label and 'spFound' value
>>> > "fpFound = " label and 'fp' value
>>> > "pcFound = " label and 'pc' value
>>>
>>> but I may not have enough context...
>>> From the point of view of the person reading the output, they want
>>> to know the values for sp, fp, and pc. But within the code these
>>> values are stored in the "found" variables.
>>
>> In that case, the code is wrong for the 'fp' and 'pc' outputs
>> since you changed the labels and not the variables.
> Yes, you are correct. I'll fix the output for fp and pc.
I don't need another webrev for any of my comments.
Thanks for your work on making SA more stable. The CI appreciates it!
Dan
>
> thanks,
>
> Chris
>>
>> Dan
>>
>>
>>
>> On 6/23/20 7:28 PM, Chris Plummer wrote:
>>> On 6/23/20 2:44 PM, Daniel D. Daugherty wrote:
>>>> On 6/18/20 8:54 PM, Chris Plummer wrote:
>>>>> [I've added runtime-dev to this SA review since understanding
>>>>> interpreter invokes (code generated by
>>>>> TemplateInterpreterGenerator::generate_normal_entry()) and stack
>>>>> walking is probably more important than understanding SA.]
>>>>>
>>>>> Hello,
>>>>>
>>>>> Please help review the following:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8244383
>>>>> http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html
>>>>
>>> Thanks for helping!
>>>> Sorry for the delay in reviewing this one. I've come back to it a
>>>> couple
>>>> of times because code like this is very hard to review.
>>>>
>>>>
>>>> General comment:
>>>> This fix reminds of the crazy things that AsyncGetCallTrace has to
>>>> do in order to gather call trace data. I'm guessing that SA is
>>>> attaching to the VM in an asynchronous manner and that's why it
>>>> can observe things like partially constructed frames. If that's a
>>>> correct guess, then how is SA stopping/suspending the threads?
>>>> I'm just curious here.
>>> On linux SA uses ptrace. I'm not familiar with the details of how it
>>> works. I'm not sure where ptrace allows suspends to happen, but
>>> certainly it has no knowledge of JVM safepoints or other
>>> synchronization that the JVM does. So from the JVM and SA point of
>>> view the suspend can happen at any arbitrary JVM instruction.
>>>
>>> From what I can gather, PTRACE_ATTACH suspends the entire process,
>>> so that means all threads are suspended once you attach. However,
>>> PTRACE_GETREGS can be called on individual threads (LWPs), but I
>>> don't see any indication in the SA code that you need to attach to
>>> each LWP first.
>>>>
>>>> Or this might be a case where SA is examining a core file in
>>>> which case the various threads stacks are not necessarily at
>>>> good/safepoint-safe pause points.
>>> For this bug and test it's a live process, but I think the bug being
>>> addressed here can happen just as well with a core file.
>>> Unfortunately we have very little core file testing support. I'm
>>> actually in the middle of addressing that right now.
>>>>
>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java
>>>>
>>>> No comments.
>>>>
>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java
>>>>
>>>> No comments.
>>>>
>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java
>>>>
>>>> L104: // two locations, then we canot determine the frame.
>>>> typo: s/canot/cannot/
>>> ok
>>>>
>>>> L127: // it's validity will help us determine the state of
>>>> the new frame push.
>>>> typo: s/it's/its/
>>> ok
>>>>
>>>> L148: System.out.println("CurrentFrameGuess: frame
>>>> pushed but not initaliazed.");
>>>> typo: s/initaliazed/initialized/
>>> ok
>>>>
>>>> L220: System.out.println("CurrentFrameGuess: choosing
>>>> interpreter frame: sp = " +
>>>> L221: spFound + ", fpFound = "
>>>> + fp + ", pcFound = " + pc);
>>>> This debug output doesn't make sense to me:
>>>>
>>>> "sp = " label and 'spFound' value
>>>> "fpFound = " label and 'fp' value
>>>> "pcFound = " label and 'pc' value
>>>>
>>>> but I may not have enough context...
>>> From the point of view of the person reading the output, they want
>>> to know the values for sp, fp, and pc. But within the code these
>>> values are stored in the "found" variables.
>>>>
>>>> With code like this, it's really hard to figure out if you've covered
>>>> all the cases unless you've been in the observer seat yourself and
>>>> even then your test runs may not hit all the possible cases. All you
>>>> can really do is start with a set of adaptive changes, run with those
>>>> for a while and tweak them as you gather more observations.
>>> Yes, and I know there is still a very tiny gap or two in coverage
>>> that are maybe one or two instructions long, but they aren't worth
>>> dealing with. This bug was already very rare, and with the fixes
>>> I've done I don't see any issues now. SA is a debugger, so
>>> perfection in this regard is not expected.
>>>>
>>>> Chris, nice job with this bit of insanity!
>>> Thanks! I mostly stuck with this one to help with my SA expertise.
>>> Otherwise it wouldn't have been worth the time.
>>>
>>> Chris
>>>>
>>>> Thumbs up!
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>>>
>>>>> The crux of the bug is when doing stack walking the topmost frame
>>>>> is in an inconsistent state because we are in the middle of
>>>>> pushing a new interpreter frame. Basically we are executing code
>>>>> generated by
>>>>> TemplateInterpreterGenerator::generate_normal_entry(). Since the
>>>>> PC register is in this code, SA assumes the topmost frame is an
>>>>> interpreter frame.
>>>>>
>>>>> The first issue with this interpreter frame assumption is if we
>>>>> haven't actually pushed the frame yet, then the current frame is
>>>>> the caller's frame, and could be compiled. But since SA thinks
>>>>> it's interpreted, later on it tries to convert the frame->bcp to a
>>>>> BCI, but frame->bcp is only valid for interpreter frames. Thus the
>>>>> "illegal BCI" failures. If the previous frame happened to be
>>>>> interpreted, then the existing SA code works fine.
>>>>>
>>>>> The other state of frame pushing that was problematic was when the
>>>>> new frame had been pushed, but frame->method and frame->bcp were
>>>>> not setup yet. This also would lead to "illegal BCI" later on
>>>>> because garbage would be stored in these locations.
>>>>>
>>>>> Fixing the above problems requires trying to determine the state
>>>>> of the frame push through a series of checks, and then adapting
>>>>> what is considered to be the current frame based on the outcome of
>>>>> the checks. The first things checked is that frame->method is
>>>>> valid (we can successfully instantiate a wrapper for the Method*
>>>>> without failure) and that frame->bcp is within the method. If both
>>>>> these pass then we can use the frame as-is.
>>>>>
>>>>> If the above checks fail, then we try to determine whether the
>>>>> issue is that the frame is not yet pushed and the current frame is
>>>>> actually compiled, or the frame has been pushed but not yet
>>>>> initialized. This is done by first getting the return address from
>>>>> the stack or RAX (it's location depends on how far along we are in
>>>>> the entry code) and comparing this to what is stored in
>>>>> frame->return_addr. If they are the same, then we have pushed the
>>>>> frame but not yet initialized it. In this case we use the previous
>>>>> frame (senderSP() and senderFP()) as the current frame since the
>>>>> current frame is not yet initialized. If the return address check
>>>>> fails, then we assume the new frame is not yet pushed, and and
>>>>> treat the current frame as compiled, even though PC points into
>>>>> the interpreter (we replace PC with RAX in this case).
>>>>>
>>>>> Comments in the code pretty well explain all the above, so it is
>>>>> probably easier to follow the logic in the code along with the
>>>>> comments rather than apply my above description to the code.
>>>>>
>>>>> I should add that it's very rare that we ever get into this
>>>>> special error handling code. This bug was very hard to reproduce
>>>>> initially. I was only able to make progress with reproducing and
>>>>> debugging by inserting delay loops in various spots in the code
>>>>> generated by
>>>>> TemplateInterpreterGenerator::generate_normal_entry(). By doing
>>>>> this I was able to reproduce the issue quite easily and hit all
>>>>> the logic in the new code I've added.
>>>>>
>>>>> The fix is basically entirely contained within
>>>>> AMD64CurrentFrameGuess.java. The rest of the changes are minor:
>>>>>
>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java
>>>>>
>>>>> -Main fix for CR
>>>>>
>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java
>>>>>
>>>>> -Added getInterpreterFrameBCP(), which is now needed by
>>>>> AMD64CurrentFrameGuess.java
>>>>> -I also simplified some code by using the existing
>>>>> getInterpreterFrameMethod()
>>>>> rather than replicating inline what it does.
>>>>>
>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java
>>>>>
>>>>> -I noticed the windows version of this code had some extra checks
>>>>> that were missing
>>>>> from the bsd version. I then looked at the linux version, but it
>>>>> had been heavily modified
>>>>> a short while back to leverage DWARF info to determine frames. So
>>>>> I looked at the previous
>>>>> rev and it too had these extra checks. I decided to add them to
>>>>> the BSD port. I'm not sure
>>>>> if it helps at all, but it certainly doesn't seem to do any harm.
>>>>>
>>>>> thanks,
>>>>>
>>>>> Chris
>>>>>
>>>>
>>>
>>>
>>
>
>
More information about the serviceability-dev
mailing list