RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls

Mon Jul 8 10:07:52 UTC 2019

Hi Dean and Vladimir,

the callee->is_method() in the guarantee is there probably to find 
corrupt memory.

So the problem is specifically when performing upcalls from JNI. The 
call wrapper tries to "quack like an interpreter" and performs i2c 
calls, failing due to the nmethod being not entrant. Then the subsequent 
c2i attempt fails again due to clinit barriers. In the template 
interpreter calls, the clinit barriers have already been taken, but in 
the JNI upcall path, we don't perform that barrier.

So as our current i2c calls can't actually deal with blocking at all 
(and no safepoints), the right solution seems to be sticking in some 
clinit barriers into the JavaCalls API, so that when the call is 
performed, we know the clinit barrier won't be hit.

I still think that allowing only one thing to go wrong across an i2c2i 
call is pretty scary, and I'd love to remove that restriction.

Anyway, Vladimir offered to find the right place to put the clinit 
barrier, so I'm handing this one over. :)

Thanks,
/Erik

On 2019-07-05 23:46, dean.long at oracle.com wrote:
> What is callee->is_method() doing?  Like Vladimir, I'm concerned about 
> pointers to stale metadata.
>
> dl
>
> On 7/4/19 8:02 AM, Erik Österlund wrote:
>> Hi,
>>
>> The i2c adapter sets a thread-local "callee_target" Method*, which is 
>> caught (and cleared) by SharedRuntime::handle_wrong_method if the i2c 
>> call is "bad" (e.g. not_entrant). This error handler forwards 
>> execution to the callee c2i entry. If the 
>> SharedRuntime::handle_wrong_method method is called again due to the 
>> i2c2i call being still bad, then we will crash the VM in the 
>> following guarantee in SharedRuntime::handle_wrong_method:
>>
>> Method* callee = thread->callee_target();
>> guarantee(callee != NULL && callee->is_method(), "bad handshake");
>>
>> Unfortunately, the c2i entry can indeed fail again if it, e.g., hits 
>> the new class initialization entry barrier of the c2i adapter.
>> The solution is to simply not clear the thread-local "callee_target" 
>> after handling the first failure, as we can't really know there won't 
>> be another one. There is no reason to clear this value as nobody else 
>> reads it than the SharedRuntime::handle_wrong_method handler (and we 
>> really do want it to be able to read the value as many times as it 
>> takes until the call goes through). I found some confused clearing of 
>> this callee_target in JavaThread::oops_do(), with a comment saying 
>> this is a methodOop that we need to clear to make GC happy or 
>> something. Seems like old traces of perm gen. So I deleted that too.
>>
>> I caught this in ZGC where the timing window for hitting this issue 
>> seems to be wider due to concurrent code cache unloading. But it is 
>> equally problematic for all GCs.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8227260
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/
>>
>> Thanks,
>> /Erik
>