RFR (XS): 8023037 : Race between ciEnv::register_method and nmethod::make_not_entrant_or_zombie

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Wed Nov 6 06:57:44 PST 2013


Very good point, Vladimir!

I should have looked for the bug in a different place :-)

The problem is triggered by class redefinition and it can be performed 
under Compile_lock or during safepoint. So, in ciEnv::register_method 
it's not enough to grab Compile_lock. In addition, we need to ensure 
that there are no safepoints possible during method installation to 
guarantee no dependency is invalidated.

There's one place in nmethod verification code (see MutexLocker 
ml_verify (CompiledIC_lock) in nmethod::verify_interrupt_point) where 
safepoint is possible and it causes the failure.

The bug is present only in non-product builds since nmethod verification 
is absent in product bits.

Previous fix also works, but I decided to go another route and forbid 
safepoints at all while holding Compile_lock.

Updated webrev (will appear once cr.ojn is up):
http://cr.openjdk.java.net/~vlivanov/8023037/webrev.02/

Best regards,
Vladimir Ivanov

On 11/6/13 2:36 AM, Vladimir Kozlov wrote:
> You need to be careful to avoid deadlock since there is call:
>
> 1035               old->make_not_entrant();
>
> Any way there is comment in register_method():
>
>   936     // Prevent SystemDictionary::add_to_hierarchy from running
>   937     // and invalidating our dependencies until we install this
> method.
>   938     MutexLocker ml(Compile_lock);
>
> So how it happens that the method is invalidated by dependencies?
>
> Thanks,
> Vladimir
>
> On 11/5/13 1:52 PM, Vladimir Ivanov wrote:
>> Ouch, I was misled by it's name. So, I just lowered the likelihood of
>> the failure then.
>>
>> make_not_entrant_or_zombie holds Patching_lock when it patches & unlinks
>> a nmethod.
>>
>> Do you see any problems with using it to guard method installation
>> (method->set_code(method, nm)) in register_method() as well?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 11/6/13 1:11 AM, Vladimir Kozlov wrote:
>>> Note nmethodLocker is not lock:
>>>
>>> void nmethodLocker::lock_nmethod(nmethod* nm, bool zombie_ok) {
>>>    if (nm == NULL)  return;
>>>    Atomic::inc(&nm->_lock_count);
>>>    guarantee(zombie_ok || !nm->is_zombie(), "cannot lock a zombie
>>> method");
>>> }
>>>
>>> The guard is nm->is_locked_by_vm() but neither
>>> make_not_entrant_or_zombie () or register_method() use it.
>>>
>>> So how nmethodLocker helps here?
>>>
>>> Vladimir
>>>
>>> On 11/5/13 12:42 PM, Vladimir Ivanov wrote:
>>>> Thanks! Missed that one. Fixed.
>>>>
>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> On 11/5/13 11:25 PM, Vladimir Kozlov wrote:
>>>>> Should be
>>>>>
>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01/
>>>>>
>>>>> What about this place:
>>>>>
>>>>> src/cpu/sparc/vm/sharedRuntime_sparc.cpp:  if (StressNonEntrant) {
>>>>>
>>>>> Vladimir
>>>>>
>>>>> On 11/5/13 11:10 AM, Vladimir Ivanov wrote:
>>>>>> Updated fix:
>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>
>>>>>> Removed broken StressNonEntrant code.
>>>>>> Updated comments.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>> On 11/5/13 3:39 PM, Vladimir Ivanov wrote:
>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>
>>>>>>> There's a race between compiler thread during method registration
>>>>>>> and
>>>>>>> sweeper: sweeper can invalidate a nmethod which hasn't been
>>>>>>> installed
>>>>>>> yet.
>>>>>>>
>>>>>>> Consider the following scenario:
>>>>>>>    ciEnv::register_method:
>>>>>>>      - new nmethod(...)
>>>>>>>
>>>>>>>    sweeper:
>>>>>>>      - invalidates newly allocated nmethod and patches VEP to call
>>>>>>> handle_wrong_method
>>>>>>>      - tries to unlink it, but method()->from_compiled_entry() !=
>>>>>>> verified_entry_point() since nmethod hasn't been installed yet
>>>>>>>
>>>>>>>    ciEnv::register_method:
>>>>>>>      - installs already invalidated nmethod
>>>>>>>
>>>>>>> Calling corresponding Java method will lead to infinite loop: VEP of
>>>>>>> the
>>>>>>> compiled method calls handle_wrong_method and call site resolution
>>>>>>> returns the very same compiled method.
>>>>>>>
>>>>>>> The fix is to grab a lock after nmethod is allocated in the code
>>>>>>> cache
>>>>>>> and check that it hasn't been already invalidated by the sweeper
>>>>>>> before
>>>>>>> proceeding. Sweeper already synchronizes on a nmethod before
>>>>>>> invalidating it.
>>>>>>>
>>>>>>> Testing: failing test w/ diagnostic output.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov


More information about the hotspot-compiler-dev mailing list