RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

Tue Feb 21 19:31:07 UTC 2017

Hello,

Please review the following:

http://cr.openjdk.java.net/~cjplummer/8172020/webrev.01/webrev.hotspot/
https://bugs.openjdk.java.net/browse/JDK-8172020

[Note there are s390, ppc, and aarch64 changes in this bug fix that I 
will need to have someone verify for me or I won't be pushing them.]

This bug turns up with a closed test so the CR is marked confidential. 
I'll try to provided the necessary details here. Sorry the explanation 
is lengthy, but it is a complex bug (with a relatively simple fix).

If you are on the s390, ppc, or aarch64 teams and just want to help 
verify the fix, I would say it's not necessary to understand the details 
of the bug. What's most important is you test the changes. Unfortunately 
you won't be able to reproduce the issue since the test is closed, but 
you should at least verify the fix builds and introduces no new issues. 
I did test the open aarch64 port since I have the hardware, and someone 
was kind enough to provide me with binaries both without the patch 
(which reproduced the bug) and with the patch (which passed). No other 
aarch64 testing was done.

Now on to an explanation of the bug and the fix:

The test is testing the JDI/JVMTI forceEarlyReturn support. After the 
debugger side spawns the debugee process and the debugee thread is 
created, a vm.suspend() is issued to suspend all threads on the debugee 
side. The debugger side then issues a JDI forceEarlyReturn, which will 
cause the debugee thread to exit the currently executing method. 
Although the intent is to force exit of the run() method, the run() 
method does make some method calls to some very small methods with 
"inline" in their name, such as inlineMethodReturningObject(). If 
forceEarlyReturn is done while in one of them, JDI throws 
InvalidTargetException, which the test correctly handles and will pass 
if that happens.

If the run() method is not yet compiled, what *normally* happens to 
force the thread to suspend is to swap in the interpreter safepoint 
dispatch table. This will result in a call_VM out to the interpreter to 
handle the safepointing. When later the thread is resumed (after 
initiating the forceEarlyReturn), there is code on the return side of 
call_VM to make sure the earlyret is handled immediately before any 
other bytecodes are exectuted. Thus the method that was executing at the 
time of the forceEarlyReturn should be the same as the method when the 
earlyret is handled. In fact it should be the same bytetcode(bci). Like 
I said, this is what *normally* happens, but is not the way the bug is 
exposed.

The bug turns up when the run() method is not yet compiled, and when the 
suspend is attempted we are in one of the small "inline" methods, and it 
is compiled. In this case thread suspension is achieved in a different 
manner. When a compiled method executes its return code, it first pops 
its frame, and then the " poll_return" safepoint polling instruction is 
executed. This allows the thread to be suspended. We are technically in 
the run() method at this point since the frame of the "inline"  method 
has been popped. Therefore the forceEarlyReturn is allowed. When the 
thread is resumed, the return from the "inline" method finally happens, 
which puts us in code generated by 
TemplateInterpreterGenerator::generate_return_entry_for(). However, 
there are no checks for an earlyret here, so we start executing 
bytecodes until we finally hit an earlyret check. This happens when the 
next invokespecial is done. There is a call_VM after the frame is 
pushed, and the call_VM always does an earlyret check. The problem is 
now the topmost frame (in the case where it asserts) is 
inlineMethodReturningObject(). Since it returns an object type, the 
earlyret code expects there to be an oop in the earlyret slot on the 
stack. But there isn't because the forceEarlyReturn was done on the void 
run() method. So the earlyret code fetches garbage, does an validity 
test on it, and gets the assert you see in the bug title.

So what is needed here is an earlyret check after the compiled "inline" 
method returns, and before any new bytecodes are executed. Since 
TemplateInterpreterGenerator::generate_return_entry_for() is always 
executed when the compiled "inline" method returns, the check was added 
here. For completeness, the popframe check was also added. The header 
file changes are just to make these methods public so they can be called.

I tested the failed test at least 100 times on each supported platform. 
The assert usually showed up at least 1 in 50 times. I also ran a large 
set of tests on all supported platforms, including everything we do for 
nightly testing except for some of the longest running tests.

thanks,

Chris