RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

Tue Feb 21 23:29:45 UTC 2017

Hi Chris,

It looks good to me.
But let me mediate on the fix a little bit more.
Consider it reviewed if I'll not come back in 24 hours. :)

Thank you a lot for this great investigation and fix!
Serguei

On 2/21/17 11:31, Chris Plummer wrote:
> Hello,
>
> Please review the following:
>
> http://cr.openjdk.java.net/~cjplummer/8172020/webrev.01/webrev.hotspot/
> https://bugs.openjdk.java.net/browse/JDK-8172020
>
> [Note there are s390, ppc, and aarch64 changes in this bug fix that I 
> will need to have someone verify for me or I won't be pushing them.]
>
> This bug turns up with a closed test so the CR is marked confidential. 
> I'll try to provided the necessary details here. Sorry the explanation 
> is lengthy, but it is a complex bug (with a relatively simple fix).
>
> If you are on the s390, ppc, or aarch64 teams and just want to help 
> verify the fix, I would say it's not necessary to understand the 
> details of the bug. What's most important is you test the changes. 
> Unfortunately you won't be able to reproduce the issue since the test 
> is closed, but you should at least verify the fix builds and 
> introduces no new issues. I did test the open aarch64 port since I 
> have the hardware, and someone was kind enough to provide me with 
> binaries both without the patch (which reproduced the bug) and with 
> the patch (which passed). No other aarch64 testing was done.
>
> Now on to an explanation of the bug and the fix:
>
> The test is testing the JDI/JVMTI forceEarlyReturn support. After the 
> debugger side spawns the debugee process and the debugee thread is 
> created, a vm.suspend() is issued to suspend all threads on the 
> debugee side. The debugger side then issues a JDI forceEarlyReturn, 
> which will cause the debugee thread to exit the currently executing 
> method. Although the intent is to force exit of the run() method, the 
> run() method does make some method calls to some very small methods 
> with "inline" in their name, such as inlineMethodReturningObject(). If 
> forceEarlyReturn is done while in one of them, JDI throws 
> InvalidTargetException, which the test correctly handles and will pass 
> if that happens.
>
> If the run() method is not yet compiled, what *normally* happens to 
> force the thread to suspend is to swap in the interpreter safepoint 
> dispatch table. This will result in a call_VM out to the interpreter 
> to handle the safepointing. When later the thread is resumed (after 
> initiating the forceEarlyReturn), there is code on the return side of 
> call_VM to make sure the earlyret is handled immediately before any 
> other bytecodes are exectuted. Thus the method that was executing at 
> the time of the forceEarlyReturn should be the same as the method when 
> the earlyret is handled. In fact it should be the same bytetcode(bci). 
> Like I said, this is what *normally* happens, but is not the way the 
> bug is exposed.
>
> The bug turns up when the run() method is not yet compiled, and when 
> the suspend is attempted we are in one of the small "inline" methods, 
> and it is compiled. In this case thread suspension is achieved in a 
> different manner. When a compiled method executes its return code, it 
> first pops its frame, and then the " poll_return" safepoint polling 
> instruction is executed. This allows the thread to be suspended. We 
> are technically in the run() method at this point since the frame of 
> the "inline"  method has been popped. Therefore the forceEarlyReturn 
> is allowed. When the thread is resumed, the return from the "inline" 
> method finally happens, which puts us in code generated by 
> TemplateInterpreterGenerator::generate_return_entry_for(). However, 
> there are no checks for an earlyret here, so we start executing 
> bytecodes until we finally hit an earlyret check. This happens when 
> the next invokespecial is done. There is a call_VM after the frame is 
> pushed, and the call_VM always does an earlyret check. The problem is 
> now the topmost frame (in the case where it asserts) is 
> inlineMethodReturningObject(). Since it returns an object type, the 
> earlyret code expects there to be an oop in the earlyret slot on the 
> stack. But there isn't because the forceEarlyReturn was done on the 
> void run() method. So the earlyret code fetches garbage, does an 
> validity test on it, and gets the assert you see in the bug title.
>
> So what is needed here is an earlyret check after the compiled 
> "inline" method returns, and before any new bytecodes are executed. 
> Since TemplateInterpreterGenerator::generate_return_entry_for() is 
> always executed when the compiled "inline" method returns, the check 
> was added here. For completeness, the popframe check was also added. 
> The header file changes are just to make these methods public so they 
> can be called.
>
> I tested the failed test at least 100 times on each supported 
> platform. The assert usually showed up at least 1 in 50 times. I also 
> ran a large set of tests on all supported platforms, including 
> everything we do for nightly testing except for some of the longest 
> running tests.
>
> thanks,
>
> Chris
>