[aarch64-port-dev ] RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

Chris Plummer chris.plummer at oracle.com
Tue Feb 21 23:41:31 UTC 2017


Thanks Serguei!

On 2/21/17 3:29 PM, serguei.spitsyn at oracle.com wrote:
> Hi Chris,
>
> It looks good to me.
> But let me mediate on the fix a little bit more.
> Consider it reviewed if I'll not come back in 24 hours. :)
>
> Thank you a lot for this great investigation and fix!
> Serguei
>
>
> On 2/21/17 11:31, Chris Plummer wrote:
>> Hello,
>>
>> Please review the following:
>>
>> http://cr.openjdk.java.net/~cjplummer/8172020/webrev.01/webrev.hotspot/
>> https://bugs.openjdk.java.net/browse/JDK-8172020
>>
>> [Note there are s390, ppc, and aarch64 changes in this bug fix that I 
>> will need to have someone verify for me or I won't be pushing them.]
>>
>> This bug turns up with a closed test so the CR is marked 
>> confidential. I'll try to provided the necessary details here. Sorry 
>> the explanation is lengthy, but it is a complex bug (with a 
>> relatively simple fix).
>>
>> If you are on the s390, ppc, or aarch64 teams and just want to help 
>> verify the fix, I would say it's not necessary to understand the 
>> details of the bug. What's most important is you test the changes. 
>> Unfortunately you won't be able to reproduce the issue since the test 
>> is closed, but you should at least verify the fix builds and 
>> introduces no new issues. I did test the open aarch64 port since I 
>> have the hardware, and someone was kind enough to provide me with 
>> binaries both without the patch (which reproduced the bug) and with 
>> the patch (which passed). No other aarch64 testing was done.
>>
>> Now on to an explanation of the bug and the fix:
>>
>> The test is testing the JDI/JVMTI forceEarlyReturn support. After the 
>> debugger side spawns the debugee process and the debugee thread is 
>> created, a vm.suspend() is issued to suspend all threads on the 
>> debugee side. The debugger side then issues a JDI forceEarlyReturn, 
>> which will cause the debugee thread to exit the currently executing 
>> method. Although the intent is to force exit of the run() method, the 
>> run() method does make some method calls to some very small methods 
>> with "inline" in their name, such as inlineMethodReturningObject(). 
>> If forceEarlyReturn is done while in one of them, JDI throws 
>> InvalidTargetException, which the test correctly handles and will 
>> pass if that happens.
>>
>> If the run() method is not yet compiled, what *normally* happens to 
>> force the thread to suspend is to swap in the interpreter safepoint 
>> dispatch table. This will result in a call_VM out to the interpreter 
>> to handle the safepointing. When later the thread is resumed (after 
>> initiating the forceEarlyReturn), there is code on the return side of 
>> call_VM to make sure the earlyret is handled immediately before any 
>> other bytecodes are exectuted. Thus the method that was executing at 
>> the time of the forceEarlyReturn should be the same as the method 
>> when the earlyret is handled. In fact it should be the same 
>> bytetcode(bci). Like I said, this is what *normally* happens, but is 
>> not the way the bug is exposed.
>>
>> The bug turns up when the run() method is not yet compiled, and when 
>> the suspend is attempted we are in one of the small "inline" methods, 
>> and it is compiled. In this case thread suspension is achieved in a 
>> different manner. When a compiled method executes its return code, it 
>> first pops its frame, and then the " poll_return" safepoint polling 
>> instruction is executed. This allows the thread to be suspended. We 
>> are technically in the run() method at this point since the frame of 
>> the "inline"  method has been popped. Therefore the forceEarlyReturn 
>> is allowed. When the thread is resumed, the return from the "inline" 
>> method finally happens, which puts us in code generated by 
>> TemplateInterpreterGenerator::generate_return_entry_for(). However, 
>> there are no checks for an earlyret here, so we start executing 
>> bytecodes until we finally hit an earlyret check. This happens when 
>> the next invokespecial is done. There is a call_VM after the frame is 
>> pushed, and the call_VM always does an earlyret check. The problem is 
>> now the topmost frame (in the case where it asserts) is 
>> inlineMethodReturningObject(). Since it returns an object type, the 
>> earlyret code expects there to be an oop in the earlyret slot on the 
>> stack. But there isn't because the forceEarlyReturn was done on the 
>> void run() method. So the earlyret code fetches garbage, does an 
>> validity test on it, and gets the assert you see in the bug title.
>>
>> So what is needed here is an earlyret check after the compiled 
>> "inline" method returns, and before any new bytecodes are executed. 
>> Since TemplateInterpreterGenerator::generate_return_entry_for() is 
>> always executed when the compiled "inline" method returns, the check 
>> was added here. For completeness, the popframe check was also added. 
>> The header file changes are just to make these methods public so they 
>> can be called.
>>
>> I tested the failed test at least 100 times on each supported 
>> platform. The assert usually showed up at least 1 in 50 times. I also 
>> ran a large set of tests on all supported platforms, including 
>> everything we do for nightly testing except for some of the longest 
>> running tests.
>>
>> thanks,
>>
>> Chris
>>
>



More information about the aarch64-port-dev mailing list