RFR (XS): 8023037 : Race between ciEnv::register_method and nmethod::make_not_entrant_or_zombie

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Nov 8 00:52:22 PST 2013


Thank you, Igor.

Best regards,
Vladimir Ivanov

On 11/8/13 6:27 AM, Igor Veresov wrote:
> Yup, looks good.
>
> igor
>
> On Nov 7, 2013, at 1:24 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>
>> Thank you, Vladimir.
>>
>> I need 1 more Reviewer to look at it. Any volunteers? :-)
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 11/8/13 12:24 AM, Vladimir Kozlov wrote:
>>> Good.
>>>
>>> Vladimir
>>>
>>> On 11/7/13 12:04 PM, Vladimir Ivanov wrote:
>>>> Oops, sorry about that... Fixed.
>>>>
>>>> This time I built it and ran the test to be 100% sure :-)
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> On 11/7/13 11:26 PM, Vladimir Kozlov wrote:
>>>>> No, should be method() and 'this' and add parenthesis:
>>>>>
>>>>> +   bool is_installed = (method()->code() == this) // nmethod is in
>>>>> state 'alive' and installed
>>>>> +                       || !this->is_in_use();     // nmethod is
>>>>> installed, but not in 'alive' state
>>>>>
>>>>> Vladimir K
>>>>>
>>>>> On 11/7/13 10:35 AM, Vladimir Ivanov wrote:
>>>>>> Sorry, blindly copy-pasted the code.
>>>>>> Thanks for catching that.
>>>>>> Fixed (same link, hit refresh).
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>> On 11/7/13 10:00 PM, Vladimir Kozlov wrote:
>>>>>>> It is better.
>>>>>>>
>>>>>>> About next 2 lines. Where you get method? The method() is used to get
>>>>>>> pointer to java Method. And is_in_use() is nmethod's method. In
>>>>>>> comments
>>>>>>> you need to say nmethod (and not method):
>>>>>>>
>>>>>>> +   bool is_installed = method->code() == this || // method is in
>>>>>>> state
>>>>>>> 'alive' and installed
>>>>>>> +                       !method->is_in_use();     // method is
>>>>>>> installed
>>>>>>> and in state: 'not-entrant', 'zombie', or, 'unloaded'
>>>>>>>
>>>>>>> Second comment too big, you can simple say: // or nmethod is not in
>>>>>>> 'alive' state.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 11/7/13 6:50 AM, Vladimir Ivanov wrote:
>>>>>>>> Vladimir, thank you for the feedback.
>>>>>>>>
>>>>>>>> What do you think about the following version?
>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.03/
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Vladimir Ivanov
>>>>>>>>
>>>>>>>> On 11/7/13 2:52 AM, Vladimir Kozlov wrote:
>>>>>>>>> Vladimir,
>>>>>>>>>
>>>>>>>>> Thank you for finding the real cause.
>>>>>>>>>
>>>>>>>>> I think you need the comment before nm->verify() in new_nmethod()
>>>>>>>>> explaining why we should avoid safepoint as you explained here.
>>>>>>>>>
>>>>>>>>> I am not comfortable about passing new allow_safepoints parameter
>>>>>>>>> through all verify calls because it is the only one case. We still
>>>>>>>>> have
>>>>>>>>> not installed nmethod at this point. There should be some check
>>>>>>>>> (nmethod's flags/state) which we can use to skip CompiledIC_lock.
>>>>>>>>>
>>>>>>>>> I thought about the need to verify scopes in
>>>>>>>>> verify_interrupt_point() in
>>>>>>>>> all cases. But the verification code in ScopeDesc::verify() is
>>>>>>>>> commented. What? In general it is good to call it anyway. And you
>>>>>>>>> don't
>>>>>>>>> need IC to get the end_of_call() - I think we can inline what
>>>>>>>>> CompiledIC_at() does: nativeCall_at(call_site)->end_of_call().
>>>>>>>>>
>>>>>>>>> This code calls CompiledIC_at() because it does IC verification
>>>>>>>>> which we
>>>>>>>>> can skip in our case.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 11/6/13 6:57 AM, Vladimir Ivanov wrote:
>>>>>>>>>> Very good point, Vladimir!
>>>>>>>>>>
>>>>>>>>>> I should have looked for the bug in a different place :-)
>>>>>>>>>>
>>>>>>>>>> The problem is triggered by class redefinition and it can be
>>>>>>>>>> performed
>>>>>>>>>> under Compile_lock or during safepoint. So, in
>>>>>>>>>> ciEnv::register_method
>>>>>>>>>> it's not enough to grab Compile_lock. In addition, we need to
>>>>>>>>>> ensure
>>>>>>>>>> that there are no safepoints possible during method installation to
>>>>>>>>>> guarantee no dependency is invalidated.
>>>>>>>>>>
>>>>>>>>>> There's one place in nmethod verification code (see MutexLocker
>>>>>>>>>> ml_verify (CompiledIC_lock) in nmethod::verify_interrupt_point)
>>>>>>>>>> where
>>>>>>>>>> safepoint is possible and it causes the failure.
>>>>>>>>>>
>>>>>>>>>> The bug is present only in non-product builds since nmethod
>>>>>>>>>> verification
>>>>>>>>>> is absent in product bits.
>>>>>>>>>>
>>>>>>>>>> Previous fix also works, but I decided to go another route and
>>>>>>>>>> forbid
>>>>>>>>>> safepoints at all while holding Compile_lock.
>>>>>>>>>>
>>>>>>>>>> Updated webrev (will appear once cr.ojn is up):
>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.02/
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>
>>>>>>>>>> On 11/6/13 2:36 AM, Vladimir Kozlov wrote:
>>>>>>>>>>> You need to be careful to avoid deadlock since there is call:
>>>>>>>>>>>
>>>>>>>>>>> 1035               old->make_not_entrant();
>>>>>>>>>>>
>>>>>>>>>>> Any way there is comment in register_method():
>>>>>>>>>>>
>>>>>>>>>>>   936     // Prevent SystemDictionary::add_to_hierarchy from
>>>>>>>>>>> running
>>>>>>>>>>>   937     // and invalidating our dependencies until we install
>>>>>>>>>>> this
>>>>>>>>>>> method.
>>>>>>>>>>>   938     MutexLocker ml(Compile_lock);
>>>>>>>>>>>
>>>>>>>>>>> So how it happens that the method is invalidated by dependencies?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>>
>>>>>>>>>>> On 11/5/13 1:52 PM, Vladimir Ivanov wrote:
>>>>>>>>>>>> Ouch, I was misled by it's name. So, I just lowered the
>>>>>>>>>>>> likelihood of
>>>>>>>>>>>> the failure then.
>>>>>>>>>>>>
>>>>>>>>>>>> make_not_entrant_or_zombie holds Patching_lock when it patches &
>>>>>>>>>>>> unlinks
>>>>>>>>>>>> a nmethod.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you see any problems with using it to guard method
>>>>>>>>>>>> installation
>>>>>>>>>>>> (method->set_code(method, nm)) in register_method() as well?
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/6/13 1:11 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>>> Note nmethodLocker is not lock:
>>>>>>>>>>>>>
>>>>>>>>>>>>> void nmethodLocker::lock_nmethod(nmethod* nm, bool zombie_ok) {
>>>>>>>>>>>>>    if (nm == NULL)  return;
>>>>>>>>>>>>>    Atomic::inc(&nm->_lock_count);
>>>>>>>>>>>>>    guarantee(zombie_ok || !nm->is_zombie(), "cannot lock a
>>>>>>>>>>>>> zombie
>>>>>>>>>>>>> method");
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> The guard is nm->is_locked_by_vm() but neither
>>>>>>>>>>>>> make_not_entrant_or_zombie () or register_method() use it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So how nmethodLocker helps here?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/5/13 12:42 PM, Vladimir Ivanov wrote:
>>>>>>>>>>>>>> Thanks! Missed that one. Fixed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 11/5/13 11:25 PM, Vladimir Kozlov wrote:
>>>>>>>>>>>>>>> Should be
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What about this place:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> src/cpu/sparc/vm/sharedRuntime_sparc.cpp:  if
>>>>>>>>>>>>>>> (StressNonEntrant) {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 11/5/13 11:10 AM, Vladimir Ivanov wrote:
>>>>>>>>>>>>>>>> Updated fix:
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Removed broken StressNonEntrant code.
>>>>>>>>>>>>>>>> Updated comments.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 11/5/13 3:39 PM, Vladimir Ivanov wrote:
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There's a race between compiler thread during method
>>>>>>>>>>>>>>>>> registration
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> sweeper: sweeper can invalidate a nmethod which hasn't been
>>>>>>>>>>>>>>>>> installed
>>>>>>>>>>>>>>>>> yet.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Consider the following scenario:
>>>>>>>>>>>>>>>>>    ciEnv::register_method:
>>>>>>>>>>>>>>>>>      - new nmethod(...)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    sweeper:
>>>>>>>>>>>>>>>>>      - invalidates newly allocated nmethod and patches
>>>>>>>>>>>>>>>>> VEP to
>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>> handle_wrong_method
>>>>>>>>>>>>>>>>>      - tries to unlink it, but
>>>>>>>>>>>>>>>>> method()->from_compiled_entry() !=
>>>>>>>>>>>>>>>>> verified_entry_point() since nmethod hasn't been installed
>>>>>>>>>>>>>>>>> yet
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    ciEnv::register_method:
>>>>>>>>>>>>>>>>>      - installs already invalidated nmethod
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Calling corresponding Java method will lead to infinite
>>>>>>>>>>>>>>>>> loop:
>>>>>>>>>>>>>>>>> VEP of
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> compiled method calls handle_wrong_method and call site
>>>>>>>>>>>>>>>>> resolution
>>>>>>>>>>>>>>>>> returns the very same compiled method.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The fix is to grab a lock after nmethod is allocated in the
>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>> and check that it hasn't been already invalidated by the
>>>>>>>>>>>>>>>>> sweeper
>>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>> proceeding. Sweeper already synchronizes on a nmethod before
>>>>>>>>>>>>>>>>> invalidating it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Testing: failing test w/ diagnostic output.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>> Vladimir Ivanov
>


More information about the hotspot-compiler-dev mailing list