RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls

Mon Jul 8 19:46:48 UTC 2019

On 7/8/19 6:07 AM, Erik Österlund wrote:
> Hi Dean and Vladimir,
>
> the callee->is_method() in the guarantee is there probably to find 
> corrupt memory.
>
> So the problem is specifically when performing upcalls from JNI. The 
> call wrapper tries to "quack like an interpreter" and performs i2c 
> calls, failing due to the nmethod being not entrant. Then the 
> subsequent c2i attempt fails again due to clinit barriers. In the 
> template interpreter calls, the clinit barriers have already been 
> taken, but in the JNI upcall path, we don't perform that barrier.
>
> So as our current i2c calls can't actually deal with blocking at all 
> (and no safepoints), the right solution seems to be sticking in some 
> clinit barriers into the JavaCalls API, so that when the call is 
> performed, we know the clinit barrier won't be hit.

Ok, you *cannot* block with callee_method in JavaThread.  Ignore my last 
mail!  That comment in oops_do was a leftover from permgen.

Thanks,
Coleen

>
> I still think that allowing only one thing to go wrong across an i2c2i 
> call is pretty scary, and I'd love to remove that restriction.
>
> Anyway, Vladimir offered to find the right place to put the clinit 
> barrier, so I'm handing this one over. :)
>
> Thanks,
> /Erik
>
> On 2019-07-05 23:46, dean.long at oracle.com wrote:
>> What is callee->is_method() doing? Like Vladimir, I'm concerned about 
>> pointers to stale metadata.
>>
>> dl
>>
>> On 7/4/19 8:02 AM, Erik Österlund wrote:
>>> Hi,
>>>
>>> The i2c adapter sets a thread-local "callee_target" Method*, which 
>>> is caught (and cleared) by SharedRuntime::handle_wrong_method if the 
>>> i2c call is "bad" (e.g. not_entrant). This error handler forwards 
>>> execution to the callee c2i entry. If the 
>>> SharedRuntime::handle_wrong_method method is called again due to the 
>>> i2c2i call being still bad, then we will crash the VM in the 
>>> following guarantee in SharedRuntime::handle_wrong_method:
>>>
>>> Method* callee = thread->callee_target();
>>> guarantee(callee != NULL && callee->is_method(), "bad handshake");
>>>
>>> Unfortunately, the c2i entry can indeed fail again if it, e.g., hits 
>>> the new class initialization entry barrier of the c2i adapter.
>>> The solution is to simply not clear the thread-local "callee_target" 
>>> after handling the first failure, as we can't really know there 
>>> won't be another one. There is no reason to clear this value as 
>>> nobody else reads it than the SharedRuntime::handle_wrong_method 
>>> handler (and we really do want it to be able to read the value as 
>>> many times as it takes until the call goes through). I found some 
>>> confused clearing of this callee_target in JavaThread::oops_do(), 
>>> with a comment saying this is a methodOop that we need to clear to 
>>> make GC happy or something. Seems like old traces of perm gen. So I 
>>> deleted that too.
>>>
>>> I caught this in ZGC where the timing window for hitting this issue 
>>> seems to be wider due to concurrent code cache unloading. But it is 
>>> equally problematic for all GCs.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8227260
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/
>>>
>>> Thanks,
>>> /Erik
>>
>