RFR (XS): 8023037 : Race between ciEnv::register_method and nmethod::make_not_entrant_or_zombie
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Thu Nov 7 09:04:25 PST 2013
Albert,
>
> thanks for the explanation. So it is equivalent to the following statement:
>
> bool is_installed = method->code() == this || // method is in state
> alive and installed
> !method->is_in_use() // method is
> installed and in state: 'not-entrant', 'zombie', or, 'unloaded'
>
Yes, it's equivalent. And it's much more readable :-)
Thanks for suggestion. I integrated your version.
>
> Here are some thoughs that came to my mind when I was working with the
> sweeper...
>
> It is confusing that nm->is_in_use() returns true although the method
> has not yet been installed.
> A cleaner solution (but I assume it is not time time to do that in RDP2)
> would probably be to introduce
> a new state 'created', to which a nmethod is initialized when it is
> created. The nmethod would change
> the state to 'in_use' when it is installed.
>
Agree. I miss 'created' nmethod state and considered adding it, but
refrained. I created RFE [1] to track that.
Best regards,
Vladimir Ivanov
[1] https://bugs.openjdk.java.net/browse/JDK-8028001
> Best,
> Albert
>
> On 11/07/2013 04:50 PM, Vladimir Ivanov wrote:
>> Albert,
>>
>> I want to exclude only the case when a newly allocated method hasn't
>> been installed yet. If (_state == alive) is missing, IC verification
>> for non-entrant methods is skipped as well.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 11/7/13 7:32 PM, Albert Noll wrote:
>>> Hi Vladimir,
>>>
>>> I have a question concerning the following lines:
>>>
>>> *+ // Verify IC only when nmethod installation is finished.*
>>> *+ bool is_installed = !(_state == alive && method()->code() !=
>>> this);*
>>> *+ if (is_installed)
>>>
>>>
>>> *
>>>
>>> Wouldn't it be also sufficient to check if a method is installed with
>>> the following lines?
>>>
>>> bool is_installed = method()->code() == this;
>>>
>>>
>>> Best,
>>> Albert
>>>
>>>
>>> On 11/07/2013 03:50 PM, Vladimir Ivanov wrote:
>>>> Vladimir, thank you for the feedback.
>>>>
>>>> What do you think about the following version?
>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.03/
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> On 11/7/13 2:52 AM, Vladimir Kozlov wrote:
>>>>> Vladimir,
>>>>>
>>>>> Thank you for finding the real cause.
>>>>>
>>>>> I think you need the comment before nm->verify() in new_nmethod()
>>>>> explaining why we should avoid safepoint as you explained here.
>>>>>
>>>>> I am not comfortable about passing new allow_safepoints parameter
>>>>> through all verify calls because it is the only one case. We still
>>>>> have
>>>>> not installed nmethod at this point. There should be some check
>>>>> (nmethod's flags/state) which we can use to skip CompiledIC_lock.
>>>>>
>>>>> I thought about the need to verify scopes in
>>>>> verify_interrupt_point() in
>>>>> all cases. But the verification code in ScopeDesc::verify() is
>>>>> commented. What? In general it is good to call it anyway. And you
>>>>> don't
>>>>> need IC to get the end_of_call() - I think we can inline what
>>>>> CompiledIC_at() does: nativeCall_at(call_site)->end_of_call().
>>>>>
>>>>> This code calls CompiledIC_at() because it does IC verification
>>>>> which we
>>>>> can skip in our case.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 11/6/13 6:57 AM, Vladimir Ivanov wrote:
>>>>>> Very good point, Vladimir!
>>>>>>
>>>>>> I should have looked for the bug in a different place :-)
>>>>>>
>>>>>> The problem is triggered by class redefinition and it can be
>>>>>> performed
>>>>>> under Compile_lock or during safepoint. So, in ciEnv::register_method
>>>>>> it's not enough to grab Compile_lock. In addition, we need to ensure
>>>>>> that there are no safepoints possible during method installation to
>>>>>> guarantee no dependency is invalidated.
>>>>>>
>>>>>> There's one place in nmethod verification code (see MutexLocker
>>>>>> ml_verify (CompiledIC_lock) in nmethod::verify_interrupt_point) where
>>>>>> safepoint is possible and it causes the failure.
>>>>>>
>>>>>> The bug is present only in non-product builds since nmethod
>>>>>> verification
>>>>>> is absent in product bits.
>>>>>>
>>>>>> Previous fix also works, but I decided to go another route and forbid
>>>>>> safepoints at all while holding Compile_lock.
>>>>>>
>>>>>> Updated webrev (will appear once cr.ojn is up):
>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.02/
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>> On 11/6/13 2:36 AM, Vladimir Kozlov wrote:
>>>>>>> You need to be careful to avoid deadlock since there is call:
>>>>>>>
>>>>>>> 1035 old->make_not_entrant();
>>>>>>>
>>>>>>> Any way there is comment in register_method():
>>>>>>>
>>>>>>> 936 // Prevent SystemDictionary::add_to_hierarchy from running
>>>>>>> 937 // and invalidating our dependencies until we install this
>>>>>>> method.
>>>>>>> 938 MutexLocker ml(Compile_lock);
>>>>>>>
>>>>>>> So how it happens that the method is invalidated by dependencies?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 11/5/13 1:52 PM, Vladimir Ivanov wrote:
>>>>>>>> Ouch, I was misled by it's name. So, I just lowered the
>>>>>>>> likelihood of
>>>>>>>> the failure then.
>>>>>>>>
>>>>>>>> make_not_entrant_or_zombie holds Patching_lock when it patches &
>>>>>>>> unlinks
>>>>>>>> a nmethod.
>>>>>>>>
>>>>>>>> Do you see any problems with using it to guard method installation
>>>>>>>> (method->set_code(method, nm)) in register_method() as well?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Vladimir Ivanov
>>>>>>>>
>>>>>>>> On 11/6/13 1:11 AM, Vladimir Kozlov wrote:
>>>>>>>>> Note nmethodLocker is not lock:
>>>>>>>>>
>>>>>>>>> void nmethodLocker::lock_nmethod(nmethod* nm, bool zombie_ok) {
>>>>>>>>> if (nm == NULL) return;
>>>>>>>>> Atomic::inc(&nm->_lock_count);
>>>>>>>>> guarantee(zombie_ok || !nm->is_zombie(), "cannot lock a zombie
>>>>>>>>> method");
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> The guard is nm->is_locked_by_vm() but neither
>>>>>>>>> make_not_entrant_or_zombie () or register_method() use it.
>>>>>>>>>
>>>>>>>>> So how nmethodLocker helps here?
>>>>>>>>>
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 11/5/13 12:42 PM, Vladimir Ivanov wrote:
>>>>>>>>>> Thanks! Missed that one. Fixed.
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>
>>>>>>>>>> On 11/5/13 11:25 PM, Vladimir Kozlov wrote:
>>>>>>>>>>> Should be
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01/
>>>>>>>>>>>
>>>>>>>>>>> What about this place:
>>>>>>>>>>>
>>>>>>>>>>> src/cpu/sparc/vm/sharedRuntime_sparc.cpp: if
>>>>>>>>>>> (StressNonEntrant) {
>>>>>>>>>>>
>>>>>>>>>>> Vladimir
>>>>>>>>>>>
>>>>>>>>>>> On 11/5/13 11:10 AM, Vladimir Ivanov wrote:
>>>>>>>>>>>> Updated fix:
>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>>>>>>
>>>>>>>>>>>> Removed broken StressNonEntrant code.
>>>>>>>>>>>> Updated comments.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/5/13 3:39 PM, Vladimir Ivanov wrote:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>>>>>>>
>>>>>>>>>>>>> There's a race between compiler thread during method
>>>>>>>>>>>>> registration
>>>>>>>>>>>>> and
>>>>>>>>>>>>> sweeper: sweeper can invalidate a nmethod which hasn't been
>>>>>>>>>>>>> installed
>>>>>>>>>>>>> yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consider the following scenario:
>>>>>>>>>>>>> ciEnv::register_method:
>>>>>>>>>>>>> - new nmethod(...)
>>>>>>>>>>>>>
>>>>>>>>>>>>> sweeper:
>>>>>>>>>>>>> - invalidates newly allocated nmethod and patches VEP to
>>>>>>>>>>>>> call
>>>>>>>>>>>>> handle_wrong_method
>>>>>>>>>>>>> - tries to unlink it, but
>>>>>>>>>>>>> method()->from_compiled_entry() !=
>>>>>>>>>>>>> verified_entry_point() since nmethod hasn't been installed yet
>>>>>>>>>>>>>
>>>>>>>>>>>>> ciEnv::register_method:
>>>>>>>>>>>>> - installs already invalidated nmethod
>>>>>>>>>>>>>
>>>>>>>>>>>>> Calling corresponding Java method will lead to infinite loop:
>>>>>>>>>>>>> VEP of
>>>>>>>>>>>>> the
>>>>>>>>>>>>> compiled method calls handle_wrong_method and call site
>>>>>>>>>>>>> resolution
>>>>>>>>>>>>> returns the very same compiled method.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The fix is to grab a lock after nmethod is allocated in the
>>>>>>>>>>>>> code
>>>>>>>>>>>>> cache
>>>>>>>>>>>>> and check that it hasn't been already invalidated by the
>>>>>>>>>>>>> sweeper
>>>>>>>>>>>>> before
>>>>>>>>>>>>> proceeding. Sweeper already synchronizes on a nmethod before
>>>>>>>>>>>>> invalidating it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Testing: failing test w/ diagnostic output.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Vladimir Ivanov
>>>
>
More information about the hotspot-compiler-dev
mailing list