RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci"
Chris Plummer
chris.plummer at oracle.com
Thu Jun 25 20:52:10 UTC 2020
Ping #2. I still need one more reviewer (Thanks for the review, Dan). I
updated the webrev based on Dan's comments:
http://cr.openjdk.java.net/~cjplummer/8244383/webrev.01/
I can still make the simplification mentioned below if necessary.
thanks,
Chris
On 6/23/20 11:29 AM, Chris Plummer wrote:
> Ping!
>
> If this fix is too complicated, there is a simplification I can make,
> but at the cost of abandoning some attempts to determine the current
> frame when this error condition pops up. At the start of
> validateInterpreterFrame() it attempts to verify that the frame is
> valid by verifying that frame->method and frame->bcp are valid. This
> part is pretty simple. The complicated part is everything that follows
> if the verification fails. It attempts to error correct the situation
> by looking at various register contents and stack contents. I could
> just abandon this complicated code and return false if frame->method
> and frame->bcp don't check out. Upon return, the caller's code would
> be simplified to:
>
> if (validateInterpreterFrame(sp, fp, pc)) {
> return true; // We're done. setValues() has been called
> for valid interpreter frame.
> } else {
> return checkLastJavaSP();
> }
>
> So there's still a chance we can determine a valid current frame if
> "last java frame" has been setup. However, if not setup we would not
> be able to. This is where the complicated code in
> validateInterpreterFrame() is useful because it can usually determine
> the current frame, even if "last java frame" is not setup, but it's
> rare enough that we run into this situation that I think failing to
> get the current frame is ok.
>
> So if I can get a couple promises for reviews if I make this change,
> I'll go ahead and do it and send out a new RFR.
>
> thanks,
>
> Chris
>
> On 6/18/20 5:54 PM, Chris Plummer wrote:
>> [I've added runtime-dev to this SA review since understanding
>> interpreter invokes (code generated by
>> TemplateInterpreterGenerator::generate_normal_entry()) and stack
>> walking is probably more important than understanding SA.]
>>
>> Hello,
>>
>> Please help review the following:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8244383
>> http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html
>>
>> The crux of the bug is when doing stack walking the topmost frame is
>> in an inconsistent state because we are in the middle of pushing a
>> new interpreter frame. Basically we are executing code generated by
>> TemplateInterpreterGenerator::generate_normal_entry(). Since the PC
>> register is in this code, SA assumes the topmost frame is an
>> interpreter frame.
>>
>> The first issue with this interpreter frame assumption is if we
>> haven't actually pushed the frame yet, then the current frame is the
>> caller's frame, and could be compiled. But since SA thinks it's
>> interpreted, later on it tries to convert the frame->bcp to a BCI,
>> but frame->bcp is only valid for interpreter frames. Thus the
>> "illegal BCI" failures. If the previous frame happened to be
>> interpreted, then the existing SA code works fine.
>>
>> The other state of frame pushing that was problematic was when the
>> new frame had been pushed, but frame->method and frame->bcp were not
>> setup yet. This also would lead to "illegal BCI" later on because
>> garbage would be stored in these locations.
>>
>> Fixing the above problems requires trying to determine the state of
>> the frame push through a series of checks, and then adapting what is
>> considered to be the current frame based on the outcome of the
>> checks. The first things checked is that frame->method is valid (we
>> can successfully instantiate a wrapper for the Method* without
>> failure) and that frame->bcp is within the method. If both these pass
>> then we can use the frame as-is.
>>
>> If the above checks fail, then we try to determine whether the issue
>> is that the frame is not yet pushed and the current frame is actually
>> compiled, or the frame has been pushed but not yet initialized. This
>> is done by first getting the return address from the stack or RAX
>> (it's location depends on how far along we are in the entry code) and
>> comparing this to what is stored in frame->return_addr. If they are
>> the same, then we have pushed the frame but not yet initialized it.
>> In this case we use the previous frame (senderSP() and senderFP()) as
>> the current frame since the current frame is not yet initialized. If
>> the return address check fails, then we assume the new frame is not
>> yet pushed, and and treat the current frame as compiled, even though
>> PC points into the interpreter (we replace PC with RAX in this case).
>>
>> Comments in the code pretty well explain all the above, so it is
>> probably easier to follow the logic in the code along with the
>> comments rather than apply my above description to the code.
>>
>> I should add that it's very rare that we ever get into this special
>> error handling code. This bug was very hard to reproduce initially. I
>> was only able to make progress with reproducing and debugging by
>> inserting delay loops in various spots in the code generated by
>> TemplateInterpreterGenerator::generate_normal_entry(). By doing this
>> I was able to reproduce the issue quite easily and hit all the logic
>> in the new code I've added.
>>
>> The fix is basically entirely contained within
>> AMD64CurrentFrameGuess.java. The rest of the changes are minor:
>>
>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java
>>
>> -Main fix for CR
>>
>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java
>>
>> -Added getInterpreterFrameBCP(), which is now needed by
>> AMD64CurrentFrameGuess.java
>> -I also simplified some code by using the existing
>> getInterpreterFrameMethod()
>> rather than replicating inline what it does.
>>
>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java
>>
>> -I noticed the windows version of this code had some extra checks
>> that were missing
>> from the bsd version. I then looked at the linux version, but it had
>> been heavily modified
>> a short while back to leverage DWARF info to determine frames. So I
>> looked at the previous
>> rev and it too had these extra checks. I decided to add them to the
>> BSD port. I'm not sure
>> if it helps at all, but it certainly doesn't seem to do any harm.
>>
>> thanks,
>>
>> Chris
>>
>
>
More information about the serviceability-dev
mailing list