RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls

Wed Jul 17 15:53:19 UTC 2019

Hi Vladimir,

thanks for explaining. So I think it's correct.

Best regards,
Martin

> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Mittwoch, 17. Juli 2019 17:32
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-dev at openjdk.java.net
> Subject: Re: RFR[13]: 8227260: Can't deal with
> SharedRuntime::handle_wrong_method triggering more than once for
> interpreter calls
> 
> 
> 
> > thank you for taking care of these platforms. PPC64 and s390 parts look
> good and the test passes.
> 
> Thanks, Martin.
> 
> > sharedRuntime.cpp:
> > Is caller_frame.is_entry_frame() precise enough?
> > Can it not also be true when calling normally from VM?
> > If so,  don't we need the clinit check in this case?
> 
> If you have upcalls from JVM code in mind, then there's already a
> barrier on caller side: JavaCalls::call_static() calls into
> LinkResolver::resolve_static_call() which has initialization barrier.
> So, there's no need to repeat the check.
> 
> Best regards,
> Vladimir Ivanov
> 
> >> -----Original Message-----
> >> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> >> Sent: Mittwoch, 17. Juli 2019 15:07
> >> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-
> >> dev at openjdk.java.net; Dmitrij Pochepko <dmitrij.pochepko at bell-
> sw.com>
> >> Subject: Re: RFR[13]: 8227260: Can't deal with
> >> SharedRuntime::handle_wrong_method triggering more than once for
> >> interpreter calls
> >>
> >> Thanks, Erik.
> >>
> >> Also, since I touch platform-specific code, I'd like Martin and Dmitrij
> >> (implementors of support for s390, ppc, and aarch64) to take a look at
> >> the patch as well.
> >>
> >> Best regards,
> >> Vladimir Ivanov
> >>
> >> On 17/07/2019 15:25, Erik Österlund wrote:
> >>> Hi Vladimir,
> >>>
> >>> Looks good. Thanks for fixing.
> >>>
> >>> /Erik
> >>>
> >>> On 2019-07-17 12:26, Vladimir Ivanov wrote:
> >>>> Revised fix:
> >>>>     http://cr.openjdk.java.net/~vlivanov/8227260/webrev.00/
> >>>>
> >>>> It turned out the problem is not specific to i2c2i: fast class
> >>>> initialization barriers on nmethod entry trigger the assert as well.
> >>>>
> >>>> JNI upcalls (CallStatic<type>Method) don't have class initialization
> >>>> checks, so it's possible to initiate a JNI upcall from a
> >>>> non-initializing thread and JVM should let it complete.
> >>>>
> >>>> It leads to a busy loop (asserts in debug) between nmethod entry
> >>>> barrier & SharedRuntime::handle_wrong_method until holder class is
> >>>> initialized (possibly infinite if it blocks class initialization).
> >>>>
> >>>> Proposed fix is to keep using c2i, but jump over class initialization
> >>>> barrier right to the argument shuffling logic on verified entry when
> >>>> coming from SharedRuntime::handle_wrong_method.
> >>>>
> >>>> Improved regression test reliably reproduces the problem.
> >>>>
> >>>> Testing: regression test, hs-precheckin-comp, tier1-6
> >>>>
> >>>> Best regards,
> >>>> Vladimir Ivanov
> >>>>
> >>>> On 04/07/2019 18:02, Erik Österlund wrote:
> >>>>> Hi,
> >>>>>
> >>>>> The i2c adapter sets a thread-local "callee_target" Method*, which is
> >>>>> caught (and cleared) by SharedRuntime::handle_wrong_method if
> the
> >> i2c
> >>>>> call is "bad" (e.g. not_entrant). This error handler forwards
> >>>>> execution to the callee c2i entry. If the
> >>>>> SharedRuntime::handle_wrong_method method is called again due
> to
> >> the
> >>>>> i2c2i call being still bad, then we will crash the VM in the
> >>>>> following guarantee in SharedRuntime::handle_wrong_method:
> >>>>>
> >>>>> Method* callee = thread->callee_target();
> >>>>> guarantee(callee != NULL && callee->is_method(), "bad handshake");
> >>>>>
> >>>>> Unfortunately, the c2i entry can indeed fail again if it, e.g., hits
> >>>>> the new class initialization entry barrier of the c2i adapter.
> >>>>> The solution is to simply not clear the thread-local "callee_target"
> >>>>> after handling the first failure, as we can't really know there won't
> >>>>> be another one. There is no reason to clear this value as nobody else
> >>>>> reads it than the SharedRuntime::handle_wrong_method handler
> (and
> >> we
> >>>>> really do want it to be able to read the value as many times as it
> >>>>> takes until the call goes through). I found some confused clearing of
> >>>>> this callee_target in JavaThread::oops_do(), with a comment saying
> >>>>> this is a methodOop that we need to clear to make GC happy or
> >>>>> something. Seems like old traces of perm gen. So I deleted that too.
> >>>>>
> >>>>> I caught this in ZGC where the timing window for hitting this issue
> >>>>> seems to be wider due to concurrent code cache unloading. But it is
> >>>>> equally problematic for all GCs.
> >>>>>
> >>>>> Bug:
> >>>>> https://bugs.openjdk.java.net/browse/JDK-8227260
> >>>>>
> >>>>> Webrev:
> >>>>> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/
> >>>>>
> >>>>> Thanks,
> >>>>> /Erik