RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls

Thu Jul 18 11:13:43 UTC 2019

Thanks for review, David.

Best regards,
Vladimir Ivanov

On 18/07/2019 13:54, David Holmes wrote:
> Hi Vladimir,
> 
> I'm not intimately familiar with the code details but I get the gist of 
> the fix and the avoidance of the barrier for the JNI call to restore the 
> existing behaviour. So looks good in that sense.
> 
> Thanks,
> David
> 
> On 18/07/2019 7:35 am, Vladimir Ivanov wrote:
>> Thanks, Martin and Dmitrij for reviews.
>>
>> ...
>>>> If you have upcalls from JVM code in mind, then there's already a
>>>> barrier on caller side: JavaCalls::call_static() calls into
>>>> LinkResolver::resolve_static_call() which has initialization barrier.
>>>> So, there's no need to repeat the check.
>>
>> As an afterthought, I decided to update the comment in 
>> SharedRuntime::handle_wrong_method() to clarify the difference in 
>> behavior between upcalls coming from JVM & JNI.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>> Sent: Mittwoch, 17. Juli 2019 15:07
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-
>>>>>> dev at openjdk.java.net; Dmitrij Pochepko <dmitrij.pochepko at bell-
>>>> sw.com>
>>>>>> Subject: Re: RFR[13]: 8227260: Can't deal with
>>>>>> SharedRuntime::handle_wrong_method triggering more than once for
>>>>>> interpreter calls
>>>>>>
>>>>>> Thanks, Erik.
>>>>>>
>>>>>> Also, since I touch platform-specific code, I'd like Martin and 
>>>>>> Dmitrij
>>>>>> (implementors of support for s390, ppc, and aarch64) to take a 
>>>>>> look at
>>>>>> the patch as well.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>> On 17/07/2019 15:25, Erik Österlund wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> Looks good. Thanks for fixing.
>>>>>>>
>>>>>>> /Erik
>>>>>>>
>>>>>>> On 2019-07-17 12:26, Vladimir Ivanov wrote:
>>>>>>>> Revised fix:
>>>>>>>>      http://cr.openjdk.java.net/~vlivanov/8227260/webrev.00/
>>>>>>>>
>>>>>>>> It turned out the problem is not specific to i2c2i: fast class
>>>>>>>> initialization barriers on nmethod entry trigger the assert as 
>>>>>>>> well.
>>>>>>>>
>>>>>>>> JNI upcalls (CallStatic<type>Method) don't have class 
>>>>>>>> initialization
>>>>>>>> checks, so it's possible to initiate a JNI upcall from a
>>>>>>>> non-initializing thread and JVM should let it complete.
>>>>>>>>
>>>>>>>> It leads to a busy loop (asserts in debug) between nmethod entry
>>>>>>>> barrier & SharedRuntime::handle_wrong_method until holder class is
>>>>>>>> initialized (possibly infinite if it blocks class initialization).
>>>>>>>>
>>>>>>>> Proposed fix is to keep using c2i, but jump over class 
>>>>>>>> initialization
>>>>>>>> barrier right to the argument shuffling logic on verified entry 
>>>>>>>> when
>>>>>>>> coming from SharedRuntime::handle_wrong_method.
>>>>>>>>
>>>>>>>> Improved regression test reliably reproduces the problem.
>>>>>>>>
>>>>>>>> Testing: regression test, hs-precheckin-comp, tier1-6
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Vladimir Ivanov
>>>>>>>>
>>>>>>>> On 04/07/2019 18:02, Erik Österlund wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> The i2c adapter sets a thread-local "callee_target" Method*, 
>>>>>>>>> which is
>>>>>>>>> caught (and cleared) by SharedRuntime::handle_wrong_method if
>>>> the
>>>>>> i2c
>>>>>>>>> call is "bad" (e.g. not_entrant). This error handler forwards
>>>>>>>>> execution to the callee c2i entry. If the
>>>>>>>>> SharedRuntime::handle_wrong_method method is called again due
>>>> to
>>>>>> the
>>>>>>>>> i2c2i call being still bad, then we will crash the VM in the
>>>>>>>>> following guarantee in SharedRuntime::handle_wrong_method:
>>>>>>>>>
>>>>>>>>> Method* callee = thread->callee_target();
>>>>>>>>> guarantee(callee != NULL && callee->is_method(), "bad handshake");
>>>>>>>>>
>>>>>>>>> Unfortunately, the c2i entry can indeed fail again if it, e.g., 
>>>>>>>>> hits
>>>>>>>>> the new class initialization entry barrier of the c2i adapter.
>>>>>>>>> The solution is to simply not clear the thread-local 
>>>>>>>>> "callee_target"
>>>>>>>>> after handling the first failure, as we can't really know there 
>>>>>>>>> won't
>>>>>>>>> be another one. There is no reason to clear this value as 
>>>>>>>>> nobody else
>>>>>>>>> reads it than the SharedRuntime::handle_wrong_method handler
>>>> (and
>>>>>> we
>>>>>>>>> really do want it to be able to read the value as many times as it
>>>>>>>>> takes until the call goes through). I found some confused 
>>>>>>>>> clearing of
>>>>>>>>> this callee_target in JavaThread::oops_do(), with a comment saying
>>>>>>>>> this is a methodOop that we need to clear to make GC happy or
>>>>>>>>> something. Seems like old traces of perm gen. So I deleted that 
>>>>>>>>> too.
>>>>>>>>>
>>>>>>>>> I caught this in ZGC where the timing window for hitting this 
>>>>>>>>> issue
>>>>>>>>> seems to be wider due to concurrent code cache unloading. But 
>>>>>>>>> it is
>>>>>>>>> equally problematic for all GCs.
>>>>>>>>>
>>>>>>>>> Bug:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227260
>>>>>>>>>
>>>>>>>>> Webrev:
>>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> /Erik