RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls

Thu Jul 18 10:54:52 UTC 2019

Hi Vladimir,

I'm not intimately familiar with the code details but I get the gist of 
the fix and the avoidance of the barrier for the JNI call to restore the 
existing behaviour. So looks good in that sense.

Thanks,
David

On 18/07/2019 7:35 am, Vladimir Ivanov wrote:
> Thanks, Martin and Dmitrij for reviews.
> 
> ...
>>> If you have upcalls from JVM code in mind, then there's already a
>>> barrier on caller side: JavaCalls::call_static() calls into
>>> LinkResolver::resolve_static_call() which has initialization barrier.
>>> So, there's no need to repeat the check.
> 
> As an afterthought, I decided to update the comment in 
> SharedRuntime::handle_wrong_method() to clarify the difference in 
> behavior between upcalls coming from JVM & JNI.
> 
> Best regards,
> Vladimir Ivanov
> 
>>>>> -----Original Message-----
>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>> Sent: Mittwoch, 17. Juli 2019 15:07
>>>>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-
>>>>> dev at openjdk.java.net; Dmitrij Pochepko <dmitrij.pochepko at bell-
>>> sw.com>
>>>>> Subject: Re: RFR[13]: 8227260: Can't deal with
>>>>> SharedRuntime::handle_wrong_method triggering more than once for
>>>>> interpreter calls
>>>>>
>>>>> Thanks, Erik.
>>>>>
>>>>> Also, since I touch platform-specific code, I'd like Martin and 
>>>>> Dmitrij
>>>>> (implementors of support for s390, ppc, and aarch64) to take a look at
>>>>> the patch as well.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> On 17/07/2019 15:25, Erik Österlund wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> Looks good. Thanks for fixing.
>>>>>>
>>>>>> /Erik
>>>>>>
>>>>>> On 2019-07-17 12:26, Vladimir Ivanov wrote:
>>>>>>> Revised fix:
>>>>>>>      http://cr.openjdk.java.net/~vlivanov/8227260/webrev.00/
>>>>>>>
>>>>>>> It turned out the problem is not specific to i2c2i: fast class
>>>>>>> initialization barriers on nmethod entry trigger the assert as well.
>>>>>>>
>>>>>>> JNI upcalls (CallStatic<type>Method) don't have class initialization
>>>>>>> checks, so it's possible to initiate a JNI upcall from a
>>>>>>> non-initializing thread and JVM should let it complete.
>>>>>>>
>>>>>>> It leads to a busy loop (asserts in debug) between nmethod entry
>>>>>>> barrier & SharedRuntime::handle_wrong_method until holder class is
>>>>>>> initialized (possibly infinite if it blocks class initialization).
>>>>>>>
>>>>>>> Proposed fix is to keep using c2i, but jump over class 
>>>>>>> initialization
>>>>>>> barrier right to the argument shuffling logic on verified entry when
>>>>>>> coming from SharedRuntime::handle_wrong_method.
>>>>>>>
>>>>>>> Improved regression test reliably reproduces the problem.
>>>>>>>
>>>>>>> Testing: regression test, hs-precheckin-comp, tier1-6
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>>
>>>>>>> On 04/07/2019 18:02, Erik Österlund wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The i2c adapter sets a thread-local "callee_target" Method*, 
>>>>>>>> which is
>>>>>>>> caught (and cleared) by SharedRuntime::handle_wrong_method if
>>> the
>>>>> i2c
>>>>>>>> call is "bad" (e.g. not_entrant). This error handler forwards
>>>>>>>> execution to the callee c2i entry. If the
>>>>>>>> SharedRuntime::handle_wrong_method method is called again due
>>> to
>>>>> the
>>>>>>>> i2c2i call being still bad, then we will crash the VM in the
>>>>>>>> following guarantee in SharedRuntime::handle_wrong_method:
>>>>>>>>
>>>>>>>> Method* callee = thread->callee_target();
>>>>>>>> guarantee(callee != NULL && callee->is_method(), "bad handshake");
>>>>>>>>
>>>>>>>> Unfortunately, the c2i entry can indeed fail again if it, e.g., 
>>>>>>>> hits
>>>>>>>> the new class initialization entry barrier of the c2i adapter.
>>>>>>>> The solution is to simply not clear the thread-local 
>>>>>>>> "callee_target"
>>>>>>>> after handling the first failure, as we can't really know there 
>>>>>>>> won't
>>>>>>>> be another one. There is no reason to clear this value as nobody 
>>>>>>>> else
>>>>>>>> reads it than the SharedRuntime::handle_wrong_method handler
>>> (and
>>>>> we
>>>>>>>> really do want it to be able to read the value as many times as it
>>>>>>>> takes until the call goes through). I found some confused 
>>>>>>>> clearing of
>>>>>>>> this callee_target in JavaThread::oops_do(), with a comment saying
>>>>>>>> this is a methodOop that we need to clear to make GC happy or
>>>>>>>> something. Seems like old traces of perm gen. So I deleted that 
>>>>>>>> too.
>>>>>>>>
>>>>>>>> I caught this in ZGC where the timing window for hitting this issue
>>>>>>>> seems to be wider due to concurrent code cache unloading. But it is
>>>>>>>> equally problematic for all GCs.
>>>>>>>>
>>>>>>>> Bug:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227260
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> /Erik