RFR (XS): 8023037 : Race between ciEnv::register_method and nmethod::make_not_entrant_or_zombie

Igor Veresov igor.veresov at oracle.com
Thu Nov 7 18:27:10 PST 2013


Yup, looks good.

igor

On Nov 7, 2013, at 1:24 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:

> Thank you, Vladimir.
> 
> I need 1 more Reviewer to look at it. Any volunteers? :-)
> 
> Best regards,
> Vladimir Ivanov
> 
> On 11/8/13 12:24 AM, Vladimir Kozlov wrote:
>> Good.
>> 
>> Vladimir
>> 
>> On 11/7/13 12:04 PM, Vladimir Ivanov wrote:
>>> Oops, sorry about that... Fixed.
>>> 
>>> This time I built it and ran the test to be 100% sure :-)
>>> 
>>> Best regards,
>>> Vladimir Ivanov
>>> 
>>> On 11/7/13 11:26 PM, Vladimir Kozlov wrote:
>>>> No, should be method() and 'this' and add parenthesis:
>>>> 
>>>> +   bool is_installed = (method()->code() == this) // nmethod is in
>>>> state 'alive' and installed
>>>> +                       || !this->is_in_use();     // nmethod is
>>>> installed, but not in 'alive' state
>>>> 
>>>> Vladimir K
>>>> 
>>>> On 11/7/13 10:35 AM, Vladimir Ivanov wrote:
>>>>> Sorry, blindly copy-pasted the code.
>>>>> Thanks for catching that.
>>>>> Fixed (same link, hit refresh).
>>>>> 
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>> 
>>>>> On 11/7/13 10:00 PM, Vladimir Kozlov wrote:
>>>>>> It is better.
>>>>>> 
>>>>>> About next 2 lines. Where you get method? The method() is used to get
>>>>>> pointer to java Method. And is_in_use() is nmethod's method. In
>>>>>> comments
>>>>>> you need to say nmethod (and not method):
>>>>>> 
>>>>>> +   bool is_installed = method->code() == this || // method is in
>>>>>> state
>>>>>> 'alive' and installed
>>>>>> +                       !method->is_in_use();     // method is
>>>>>> installed
>>>>>> and in state: 'not-entrant', 'zombie', or, 'unloaded'
>>>>>> 
>>>>>> Second comment too big, you can simple say: // or nmethod is not in
>>>>>> 'alive' state.
>>>>>> 
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>> 
>>>>>> On 11/7/13 6:50 AM, Vladimir Ivanov wrote:
>>>>>>> Vladimir, thank you for the feedback.
>>>>>>> 
>>>>>>> What do you think about the following version?
>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.03/
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>> 
>>>>>>> On 11/7/13 2:52 AM, Vladimir Kozlov wrote:
>>>>>>>> Vladimir,
>>>>>>>> 
>>>>>>>> Thank you for finding the real cause.
>>>>>>>> 
>>>>>>>> I think you need the comment before nm->verify() in new_nmethod()
>>>>>>>> explaining why we should avoid safepoint as you explained here.
>>>>>>>> 
>>>>>>>> I am not comfortable about passing new allow_safepoints parameter
>>>>>>>> through all verify calls because it is the only one case. We still
>>>>>>>> have
>>>>>>>> not installed nmethod at this point. There should be some check
>>>>>>>> (nmethod's flags/state) which we can use to skip CompiledIC_lock.
>>>>>>>> 
>>>>>>>> I thought about the need to verify scopes in
>>>>>>>> verify_interrupt_point() in
>>>>>>>> all cases. But the verification code in ScopeDesc::verify() is
>>>>>>>> commented. What? In general it is good to call it anyway. And you
>>>>>>>> don't
>>>>>>>> need IC to get the end_of_call() - I think we can inline what
>>>>>>>> CompiledIC_at() does: nativeCall_at(call_site)->end_of_call().
>>>>>>>> 
>>>>>>>> This code calls CompiledIC_at() because it does IC verification
>>>>>>>> which we
>>>>>>>> can skip in our case.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>> 
>>>>>>>> On 11/6/13 6:57 AM, Vladimir Ivanov wrote:
>>>>>>>>> Very good point, Vladimir!
>>>>>>>>> 
>>>>>>>>> I should have looked for the bug in a different place :-)
>>>>>>>>> 
>>>>>>>>> The problem is triggered by class redefinition and it can be
>>>>>>>>> performed
>>>>>>>>> under Compile_lock or during safepoint. So, in
>>>>>>>>> ciEnv::register_method
>>>>>>>>> it's not enough to grab Compile_lock. In addition, we need to
>>>>>>>>> ensure
>>>>>>>>> that there are no safepoints possible during method installation to
>>>>>>>>> guarantee no dependency is invalidated.
>>>>>>>>> 
>>>>>>>>> There's one place in nmethod verification code (see MutexLocker
>>>>>>>>> ml_verify (CompiledIC_lock) in nmethod::verify_interrupt_point)
>>>>>>>>> where
>>>>>>>>> safepoint is possible and it causes the failure.
>>>>>>>>> 
>>>>>>>>> The bug is present only in non-product builds since nmethod
>>>>>>>>> verification
>>>>>>>>> is absent in product bits.
>>>>>>>>> 
>>>>>>>>> Previous fix also works, but I decided to go another route and
>>>>>>>>> forbid
>>>>>>>>> safepoints at all while holding Compile_lock.
>>>>>>>>> 
>>>>>>>>> Updated webrev (will appear once cr.ojn is up):
>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.02/
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> Vladimir Ivanov
>>>>>>>>> 
>>>>>>>>> On 11/6/13 2:36 AM, Vladimir Kozlov wrote:
>>>>>>>>>> You need to be careful to avoid deadlock since there is call:
>>>>>>>>>> 
>>>>>>>>>> 1035               old->make_not_entrant();
>>>>>>>>>> 
>>>>>>>>>> Any way there is comment in register_method():
>>>>>>>>>> 
>>>>>>>>>>  936     // Prevent SystemDictionary::add_to_hierarchy from
>>>>>>>>>> running
>>>>>>>>>>  937     // and invalidating our dependencies until we install
>>>>>>>>>> this
>>>>>>>>>> method.
>>>>>>>>>>  938     MutexLocker ml(Compile_lock);
>>>>>>>>>> 
>>>>>>>>>> So how it happens that the method is invalidated by dependencies?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>> 
>>>>>>>>>> On 11/5/13 1:52 PM, Vladimir Ivanov wrote:
>>>>>>>>>>> Ouch, I was misled by it's name. So, I just lowered the
>>>>>>>>>>> likelihood of
>>>>>>>>>>> the failure then.
>>>>>>>>>>> 
>>>>>>>>>>> make_not_entrant_or_zombie holds Patching_lock when it patches &
>>>>>>>>>>> unlinks
>>>>>>>>>>> a nmethod.
>>>>>>>>>>> 
>>>>>>>>>>> Do you see any problems with using it to guard method
>>>>>>>>>>> installation
>>>>>>>>>>> (method->set_code(method, nm)) in register_method() as well?
>>>>>>>>>>> 
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>> 
>>>>>>>>>>> On 11/6/13 1:11 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>> Note nmethodLocker is not lock:
>>>>>>>>>>>> 
>>>>>>>>>>>> void nmethodLocker::lock_nmethod(nmethod* nm, bool zombie_ok) {
>>>>>>>>>>>>   if (nm == NULL)  return;
>>>>>>>>>>>>   Atomic::inc(&nm->_lock_count);
>>>>>>>>>>>>   guarantee(zombie_ok || !nm->is_zombie(), "cannot lock a
>>>>>>>>>>>> zombie
>>>>>>>>>>>> method");
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> The guard is nm->is_locked_by_vm() but neither
>>>>>>>>>>>> make_not_entrant_or_zombie () or register_method() use it.
>>>>>>>>>>>> 
>>>>>>>>>>>> So how nmethodLocker helps here?
>>>>>>>>>>>> 
>>>>>>>>>>>> Vladimir
>>>>>>>>>>>> 
>>>>>>>>>>>> On 11/5/13 12:42 PM, Vladimir Ivanov wrote:
>>>>>>>>>>>>> Thanks! Missed that one. Fixed.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 11/5/13 11:25 PM, Vladimir Kozlov wrote:
>>>>>>>>>>>>>> Should be
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01/
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What about this place:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> src/cpu/sparc/vm/sharedRuntime_sparc.cpp:  if
>>>>>>>>>>>>>> (StressNonEntrant) {
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 11/5/13 11:10 AM, Vladimir Ivanov wrote:
>>>>>>>>>>>>>>> Updated fix:
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Removed broken StressNonEntrant code.
>>>>>>>>>>>>>>> Updated comments.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 11/5/13 3:39 PM, Vladimir Ivanov wrote:
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> There's a race between compiler thread during method
>>>>>>>>>>>>>>>> registration
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> sweeper: sweeper can invalidate a nmethod which hasn't been
>>>>>>>>>>>>>>>> installed
>>>>>>>>>>>>>>>> yet.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Consider the following scenario:
>>>>>>>>>>>>>>>>   ciEnv::register_method:
>>>>>>>>>>>>>>>>     - new nmethod(...)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>   sweeper:
>>>>>>>>>>>>>>>>     - invalidates newly allocated nmethod and patches
>>>>>>>>>>>>>>>> VEP to
>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>> handle_wrong_method
>>>>>>>>>>>>>>>>     - tries to unlink it, but
>>>>>>>>>>>>>>>> method()->from_compiled_entry() !=
>>>>>>>>>>>>>>>> verified_entry_point() since nmethod hasn't been installed
>>>>>>>>>>>>>>>> yet
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>   ciEnv::register_method:
>>>>>>>>>>>>>>>>     - installs already invalidated nmethod
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Calling corresponding Java method will lead to infinite
>>>>>>>>>>>>>>>> loop:
>>>>>>>>>>>>>>>> VEP of
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> compiled method calls handle_wrong_method and call site
>>>>>>>>>>>>>>>> resolution
>>>>>>>>>>>>>>>> returns the very same compiled method.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The fix is to grab a lock after nmethod is allocated in the
>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>> and check that it hasn't been already invalidated by the
>>>>>>>>>>>>>>>> sweeper
>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>> proceeding. Sweeper already synchronizes on a nmethod before
>>>>>>>>>>>>>>>> invalidating it.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Testing: failing test w/ diagnostic output.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Vladimir Ivanov



More information about the hotspot-compiler-dev mailing list