RFR (XS): 8023037 : Race between ciEnv::register_method and nmethod::make_not_entrant_or_zombie

Albert Noll albert.noll at oracle.com
Thu Nov 7 08:12:15 PST 2013


Hi Vladimir,

thanks for the explanation. So it is equivalent to the following statement:

bool is_installed = method->code() == this || // method is in state 
alive and installed
                              !method->is_in_use()          // method is 
installed and in state: 'not-entrant', 'zombie', or, 'unloaded'


Here are some thoughs that came to my mind when I was working with the 
sweeper...

It is confusing that nm->is_in_use() returns true although the method 
has not yet been installed.
A cleaner solution (but I assume it is not time time to do that in RDP2) 
would probably be to introduce
a new state 'created', to which a nmethod is initialized when it is 
created. The nmethod would change
the state to 'in_use' when it is installed.

Best,
Albert

On 11/07/2013 04:50 PM, Vladimir Ivanov wrote:
> Albert,
>
> I want to exclude only the case when a newly allocated method hasn't 
> been installed yet. If (_state == alive) is missing, IC verification 
> for non-entrant methods is skipped as well.
>
> Best regards,
> Vladimir Ivanov
>
> On 11/7/13 7:32 PM, Albert Noll wrote:
>> Hi Vladimir,
>>
>> I have a question concerning the following lines:
>>
>> *+    // Verify IC only when nmethod installation is finished.*
>> *+    bool is_installed = !(_state == alive && method()->code() != 
>> this);*
>> *+    if (is_installed)
>>
>>
>> *
>>
>> Wouldn't it be also sufficient to check if a method is installed with
>> the following lines?
>>
>> bool is_installed = method()->code() == this;
>>
>>
>> Best,
>> Albert
>>
>>
>> On 11/07/2013 03:50 PM, Vladimir Ivanov wrote:
>>> Vladimir, thank you for the feedback.
>>>
>>> What do you think about the following version?
>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.03/
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> On 11/7/13 2:52 AM, Vladimir Kozlov wrote:
>>>> Vladimir,
>>>>
>>>> Thank you for finding the real cause.
>>>>
>>>> I think you need the comment before nm->verify() in new_nmethod()
>>>> explaining why we should avoid safepoint as you explained here.
>>>>
>>>> I am not comfortable about passing new allow_safepoints parameter
>>>> through all verify calls because it is the only one case. We still 
>>>> have
>>>> not installed nmethod at this point. There should be some check
>>>> (nmethod's flags/state) which we can use to skip CompiledIC_lock.
>>>>
>>>> I thought about the need to verify scopes in 
>>>> verify_interrupt_point() in
>>>> all cases. But the verification code in ScopeDesc::verify() is
>>>> commented. What? In general it is good to call it anyway. And you 
>>>> don't
>>>> need IC to get the end_of_call() - I think we can inline what
>>>> CompiledIC_at() does: nativeCall_at(call_site)->end_of_call().
>>>>
>>>> This code calls CompiledIC_at() because it does IC verification 
>>>> which we
>>>> can skip in our case.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 11/6/13 6:57 AM, Vladimir Ivanov wrote:
>>>>> Very good point, Vladimir!
>>>>>
>>>>> I should have looked for the bug in a different place :-)
>>>>>
>>>>> The problem is triggered by class redefinition and it can be 
>>>>> performed
>>>>> under Compile_lock or during safepoint. So, in ciEnv::register_method
>>>>> it's not enough to grab Compile_lock. In addition, we need to ensure
>>>>> that there are no safepoints possible during method installation to
>>>>> guarantee no dependency is invalidated.
>>>>>
>>>>> There's one place in nmethod verification code (see MutexLocker
>>>>> ml_verify (CompiledIC_lock) in nmethod::verify_interrupt_point) where
>>>>> safepoint is possible and it causes the failure.
>>>>>
>>>>> The bug is present only in non-product builds since nmethod
>>>>> verification
>>>>> is absent in product bits.
>>>>>
>>>>> Previous fix also works, but I decided to go another route and forbid
>>>>> safepoints at all while holding Compile_lock.
>>>>>
>>>>> Updated webrev (will appear once cr.ojn is up):
>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.02/
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> On 11/6/13 2:36 AM, Vladimir Kozlov wrote:
>>>>>> You need to be careful to avoid deadlock since there is call:
>>>>>>
>>>>>> 1035               old->make_not_entrant();
>>>>>>
>>>>>> Any way there is comment in register_method():
>>>>>>
>>>>>>   936     // Prevent SystemDictionary::add_to_hierarchy from running
>>>>>>   937     // and invalidating our dependencies until we install this
>>>>>> method.
>>>>>>   938     MutexLocker ml(Compile_lock);
>>>>>>
>>>>>> So how it happens that the method is invalidated by dependencies?
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 11/5/13 1:52 PM, Vladimir Ivanov wrote:
>>>>>>> Ouch, I was misled by it's name. So, I just lowered the 
>>>>>>> likelihood of
>>>>>>> the failure then.
>>>>>>>
>>>>>>> make_not_entrant_or_zombie holds Patching_lock when it patches &
>>>>>>> unlinks
>>>>>>> a nmethod.
>>>>>>>
>>>>>>> Do you see any problems with using it to guard method installation
>>>>>>> (method->set_code(method, nm)) in register_method() as well?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>>
>>>>>>> On 11/6/13 1:11 AM, Vladimir Kozlov wrote:
>>>>>>>> Note nmethodLocker is not lock:
>>>>>>>>
>>>>>>>> void nmethodLocker::lock_nmethod(nmethod* nm, bool zombie_ok) {
>>>>>>>>    if (nm == NULL)  return;
>>>>>>>>    Atomic::inc(&nm->_lock_count);
>>>>>>>>    guarantee(zombie_ok || !nm->is_zombie(), "cannot lock a zombie
>>>>>>>> method");
>>>>>>>> }
>>>>>>>>
>>>>>>>> The guard is nm->is_locked_by_vm() but neither
>>>>>>>> make_not_entrant_or_zombie () or register_method() use it.
>>>>>>>>
>>>>>>>> So how nmethodLocker helps here?
>>>>>>>>
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 11/5/13 12:42 PM, Vladimir Ivanov wrote:
>>>>>>>>> Thanks! Missed that one. Fixed.
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Vladimir Ivanov
>>>>>>>>>
>>>>>>>>> On 11/5/13 11:25 PM, Vladimir Kozlov wrote:
>>>>>>>>>> Should be
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01/
>>>>>>>>>>
>>>>>>>>>> What about this place:
>>>>>>>>>>
>>>>>>>>>> src/cpu/sparc/vm/sharedRuntime_sparc.cpp:  if 
>>>>>>>>>> (StressNonEntrant) {
>>>>>>>>>>
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>> On 11/5/13 11:10 AM, Vladimir Ivanov wrote:
>>>>>>>>>>> Updated fix:
>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>>>>>
>>>>>>>>>>> Removed broken StressNonEntrant code.
>>>>>>>>>>> Updated comments.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>
>>>>>>>>>>> On 11/5/13 3:39 PM, Vladimir Ivanov wrote:
>>>>>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>>>>>>
>>>>>>>>>>>> There's a race between compiler thread during method
>>>>>>>>>>>> registration
>>>>>>>>>>>> and
>>>>>>>>>>>> sweeper: sweeper can invalidate a nmethod which hasn't been
>>>>>>>>>>>> installed
>>>>>>>>>>>> yet.
>>>>>>>>>>>>
>>>>>>>>>>>> Consider the following scenario:
>>>>>>>>>>>>    ciEnv::register_method:
>>>>>>>>>>>>      - new nmethod(...)
>>>>>>>>>>>>
>>>>>>>>>>>>    sweeper:
>>>>>>>>>>>>      - invalidates newly allocated nmethod and patches VEP to
>>>>>>>>>>>> call
>>>>>>>>>>>> handle_wrong_method
>>>>>>>>>>>>      - tries to unlink it, but
>>>>>>>>>>>> method()->from_compiled_entry() !=
>>>>>>>>>>>> verified_entry_point() since nmethod hasn't been installed yet
>>>>>>>>>>>>
>>>>>>>>>>>>    ciEnv::register_method:
>>>>>>>>>>>>      - installs already invalidated nmethod
>>>>>>>>>>>>
>>>>>>>>>>>> Calling corresponding Java method will lead to infinite loop:
>>>>>>>>>>>> VEP of
>>>>>>>>>>>> the
>>>>>>>>>>>> compiled method calls handle_wrong_method and call site
>>>>>>>>>>>> resolution
>>>>>>>>>>>> returns the very same compiled method.
>>>>>>>>>>>>
>>>>>>>>>>>> The fix is to grab a lock after nmethod is allocated in the 
>>>>>>>>>>>> code
>>>>>>>>>>>> cache
>>>>>>>>>>>> and check that it hasn't been already invalidated by the 
>>>>>>>>>>>> sweeper
>>>>>>>>>>>> before
>>>>>>>>>>>> proceeding. Sweeper already synchronizes on a nmethod before
>>>>>>>>>>>> invalidating it.
>>>>>>>>>>>>
>>>>>>>>>>>> Testing: failing test w/ diagnostic output.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Vladimir Ivanov
>>



More information about the hotspot-compiler-dev mailing list