RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls

Wed Jul 17 21:35:08 UTC 2019

Thanks, Martin and Dmitrij for reviews.

...
>> If you have upcalls from JVM code in mind, then there's already a
>> barrier on caller side: JavaCalls::call_static() calls into
>> LinkResolver::resolve_static_call() which has initialization barrier.
>> So, there's no need to repeat the check.

As an afterthought, I decided to update the comment in 
SharedRuntime::handle_wrong_method() to clarify the difference in 
behavior between upcalls coming from JVM & JNI.

Best regards,
Vladimir Ivanov

>>>> -----Original Message-----
>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>> Sent: Mittwoch, 17. Juli 2019 15:07
>>>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-
>>>> dev at openjdk.java.net; Dmitrij Pochepko <dmitrij.pochepko at bell-
>> sw.com>
>>>> Subject: Re: RFR[13]: 8227260: Can't deal with
>>>> SharedRuntime::handle_wrong_method triggering more than once for
>>>> interpreter calls
>>>>
>>>> Thanks, Erik.
>>>>
>>>> Also, since I touch platform-specific code, I'd like Martin and Dmitrij
>>>> (implementors of support for s390, ppc, and aarch64) to take a look at
>>>> the patch as well.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> On 17/07/2019 15:25, Erik Österlund wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Looks good. Thanks for fixing.
>>>>>
>>>>> /Erik
>>>>>
>>>>> On 2019-07-17 12:26, Vladimir Ivanov wrote:
>>>>>> Revised fix:
>>>>>>      http://cr.openjdk.java.net/~vlivanov/8227260/webrev.00/
>>>>>>
>>>>>> It turned out the problem is not specific to i2c2i: fast class
>>>>>> initialization barriers on nmethod entry trigger the assert as well.
>>>>>>
>>>>>> JNI upcalls (CallStatic<type>Method) don't have class initialization
>>>>>> checks, so it's possible to initiate a JNI upcall from a
>>>>>> non-initializing thread and JVM should let it complete.
>>>>>>
>>>>>> It leads to a busy loop (asserts in debug) between nmethod entry
>>>>>> barrier & SharedRuntime::handle_wrong_method until holder class is
>>>>>> initialized (possibly infinite if it blocks class initialization).
>>>>>>
>>>>>> Proposed fix is to keep using c2i, but jump over class initialization
>>>>>> barrier right to the argument shuffling logic on verified entry when
>>>>>> coming from SharedRuntime::handle_wrong_method.
>>>>>>
>>>>>> Improved regression test reliably reproduces the problem.
>>>>>>
>>>>>> Testing: regression test, hs-precheckin-comp, tier1-6
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>> On 04/07/2019 18:02, Erik Österlund wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> The i2c adapter sets a thread-local "callee_target" Method*, which is
>>>>>>> caught (and cleared) by SharedRuntime::handle_wrong_method if
>> the
>>>> i2c
>>>>>>> call is "bad" (e.g. not_entrant). This error handler forwards
>>>>>>> execution to the callee c2i entry. If the
>>>>>>> SharedRuntime::handle_wrong_method method is called again due
>> to
>>>> the
>>>>>>> i2c2i call being still bad, then we will crash the VM in the
>>>>>>> following guarantee in SharedRuntime::handle_wrong_method:
>>>>>>>
>>>>>>> Method* callee = thread->callee_target();
>>>>>>> guarantee(callee != NULL && callee->is_method(), "bad handshake");
>>>>>>>
>>>>>>> Unfortunately, the c2i entry can indeed fail again if it, e.g., hits
>>>>>>> the new class initialization entry barrier of the c2i adapter.
>>>>>>> The solution is to simply not clear the thread-local "callee_target"
>>>>>>> after handling the first failure, as we can't really know there won't
>>>>>>> be another one. There is no reason to clear this value as nobody else
>>>>>>> reads it than the SharedRuntime::handle_wrong_method handler
>> (and
>>>> we
>>>>>>> really do want it to be able to read the value as many times as it
>>>>>>> takes until the call goes through). I found some confused clearing of
>>>>>>> this callee_target in JavaThread::oops_do(), with a comment saying
>>>>>>> this is a methodOop that we need to clear to make GC happy or
>>>>>>> something. Seems like old traces of perm gen. So I deleted that too.
>>>>>>>
>>>>>>> I caught this in ZGC where the timing window for hitting this issue
>>>>>>> seems to be wider due to concurrent code cache unloading. But it is
>>>>>>> equally problematic for all GCs.
>>>>>>>
>>>>>>> Bug:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227260
>>>>>>>
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/
>>>>>>>
>>>>>>> Thanks,
>>>>>>> /Erik