RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls

Wed Jul 17 15:32:00 UTC 2019

> thank you for taking care of these platforms. PPC64 and s390 parts look good and the test passes.

Thanks, Martin.

> sharedRuntime.cpp:
> Is caller_frame.is_entry_frame() precise enough?
> Can it not also be true when calling normally from VM?
> If so,  don't we need the clinit check in this case?

If you have upcalls from JVM code in mind, then there's already a 
barrier on caller side: JavaCalls::call_static() calls into 
LinkResolver::resolve_static_call() which has initialization barrier. 
So, there's no need to repeat the check.

Best regards,
Vladimir Ivanov

>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Mittwoch, 17. Juli 2019 15:07
>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-
>> dev at openjdk.java.net; Dmitrij Pochepko <dmitrij.pochepko at bell-sw.com>
>> Subject: Re: RFR[13]: 8227260: Can't deal with
>> SharedRuntime::handle_wrong_method triggering more than once for
>> interpreter calls
>>
>> Thanks, Erik.
>>
>> Also, since I touch platform-specific code, I'd like Martin and Dmitrij
>> (implementors of support for s390, ppc, and aarch64) to take a look at
>> the patch as well.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 17/07/2019 15:25, Erik Österlund wrote:
>>> Hi Vladimir,
>>>
>>> Looks good. Thanks for fixing.
>>>
>>> /Erik
>>>
>>> On 2019-07-17 12:26, Vladimir Ivanov wrote:
>>>> Revised fix:
>>>>     http://cr.openjdk.java.net/~vlivanov/8227260/webrev.00/
>>>>
>>>> It turned out the problem is not specific to i2c2i: fast class
>>>> initialization barriers on nmethod entry trigger the assert as well.
>>>>
>>>> JNI upcalls (CallStatic<type>Method) don't have class initialization
>>>> checks, so it's possible to initiate a JNI upcall from a
>>>> non-initializing thread and JVM should let it complete.
>>>>
>>>> It leads to a busy loop (asserts in debug) between nmethod entry
>>>> barrier & SharedRuntime::handle_wrong_method until holder class is
>>>> initialized (possibly infinite if it blocks class initialization).
>>>>
>>>> Proposed fix is to keep using c2i, but jump over class initialization
>>>> barrier right to the argument shuffling logic on verified entry when
>>>> coming from SharedRuntime::handle_wrong_method.
>>>>
>>>> Improved regression test reliably reproduces the problem.
>>>>
>>>> Testing: regression test, hs-precheckin-comp, tier1-6
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> On 04/07/2019 18:02, Erik Österlund wrote:
>>>>> Hi,
>>>>>
>>>>> The i2c adapter sets a thread-local "callee_target" Method*, which is
>>>>> caught (and cleared) by SharedRuntime::handle_wrong_method if the
>> i2c
>>>>> call is "bad" (e.g. not_entrant). This error handler forwards
>>>>> execution to the callee c2i entry. If the
>>>>> SharedRuntime::handle_wrong_method method is called again due to
>> the
>>>>> i2c2i call being still bad, then we will crash the VM in the
>>>>> following guarantee in SharedRuntime::handle_wrong_method:
>>>>>
>>>>> Method* callee = thread->callee_target();
>>>>> guarantee(callee != NULL && callee->is_method(), "bad handshake");
>>>>>
>>>>> Unfortunately, the c2i entry can indeed fail again if it, e.g., hits
>>>>> the new class initialization entry barrier of the c2i adapter.
>>>>> The solution is to simply not clear the thread-local "callee_target"
>>>>> after handling the first failure, as we can't really know there won't
>>>>> be another one. There is no reason to clear this value as nobody else
>>>>> reads it than the SharedRuntime::handle_wrong_method handler (and
>> we
>>>>> really do want it to be able to read the value as many times as it
>>>>> takes until the call goes through). I found some confused clearing of
>>>>> this callee_target in JavaThread::oops_do(), with a comment saying
>>>>> this is a methodOop that we need to clear to make GC happy or
>>>>> something. Seems like old traces of perm gen. So I deleted that too.
>>>>>
>>>>> I caught this in ZGC where the timing window for hitting this issue
>>>>> seems to be wider due to concurrent code cache unloading. But it is
>>>>> equally problematic for all GCs.
>>>>>
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8227260
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/
>>>>>
>>>>> Thanks,
>>>>> /Erik