RFR (XS): 8023037 : Race between ciEnv::register_method and nmethod::make_not_entrant_or_zombie

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed Nov 6 14:52:24 PST 2013


Vladimir,

Thank you for finding the real cause.

I think you need the comment before nm->verify() in new_nmethod() 
explaining why we should avoid safepoint as you explained here.

I am not comfortable about passing new allow_safepoints parameter 
through all verify calls because it is the only one case. We still have 
not installed nmethod at this point. There should be some check 
(nmethod's flags/state) which we can use to skip CompiledIC_lock.

I thought about the need to verify scopes in verify_interrupt_point() in 
all cases. But the verification code in ScopeDesc::verify() is 
commented. What? In general it is good to call it anyway. And you don't 
need IC to get the end_of_call() - I think we can inline what 
CompiledIC_at() does: nativeCall_at(call_site)->end_of_call().

This code calls CompiledIC_at() because it does IC verification which we 
can skip in our case.

Thanks,
Vladimir

On 11/6/13 6:57 AM, Vladimir Ivanov wrote:
> Very good point, Vladimir!
>
> I should have looked for the bug in a different place :-)
>
> The problem is triggered by class redefinition and it can be performed
> under Compile_lock or during safepoint. So, in ciEnv::register_method
> it's not enough to grab Compile_lock. In addition, we need to ensure
> that there are no safepoints possible during method installation to
> guarantee no dependency is invalidated.
>
> There's one place in nmethod verification code (see MutexLocker
> ml_verify (CompiledIC_lock) in nmethod::verify_interrupt_point) where
> safepoint is possible and it causes the failure.
>
> The bug is present only in non-product builds since nmethod verification
> is absent in product bits.
>
> Previous fix also works, but I decided to go another route and forbid
> safepoints at all while holding Compile_lock.
>
> Updated webrev (will appear once cr.ojn is up):
> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.02/
>
> Best regards,
> Vladimir Ivanov
>
> On 11/6/13 2:36 AM, Vladimir Kozlov wrote:
>> You need to be careful to avoid deadlock since there is call:
>>
>> 1035               old->make_not_entrant();
>>
>> Any way there is comment in register_method():
>>
>>   936     // Prevent SystemDictionary::add_to_hierarchy from running
>>   937     // and invalidating our dependencies until we install this
>> method.
>>   938     MutexLocker ml(Compile_lock);
>>
>> So how it happens that the method is invalidated by dependencies?
>>
>> Thanks,
>> Vladimir
>>
>> On 11/5/13 1:52 PM, Vladimir Ivanov wrote:
>>> Ouch, I was misled by it's name. So, I just lowered the likelihood of
>>> the failure then.
>>>
>>> make_not_entrant_or_zombie holds Patching_lock when it patches & unlinks
>>> a nmethod.
>>>
>>> Do you see any problems with using it to guard method installation
>>> (method->set_code(method, nm)) in register_method() as well?
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> On 11/6/13 1:11 AM, Vladimir Kozlov wrote:
>>>> Note nmethodLocker is not lock:
>>>>
>>>> void nmethodLocker::lock_nmethod(nmethod* nm, bool zombie_ok) {
>>>>    if (nm == NULL)  return;
>>>>    Atomic::inc(&nm->_lock_count);
>>>>    guarantee(zombie_ok || !nm->is_zombie(), "cannot lock a zombie
>>>> method");
>>>> }
>>>>
>>>> The guard is nm->is_locked_by_vm() but neither
>>>> make_not_entrant_or_zombie () or register_method() use it.
>>>>
>>>> So how nmethodLocker helps here?
>>>>
>>>> Vladimir
>>>>
>>>> On 11/5/13 12:42 PM, Vladimir Ivanov wrote:
>>>>> Thanks! Missed that one. Fixed.
>>>>>
>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> On 11/5/13 11:25 PM, Vladimir Kozlov wrote:
>>>>>> Should be
>>>>>>
>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.01/
>>>>>>
>>>>>> What about this place:
>>>>>>
>>>>>> src/cpu/sparc/vm/sharedRuntime_sparc.cpp:  if (StressNonEntrant) {
>>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>> On 11/5/13 11:10 AM, Vladimir Ivanov wrote:
>>>>>>> Updated fix:
>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>
>>>>>>> Removed broken StressNonEntrant code.
>>>>>>> Updated comments.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>>
>>>>>>> On 11/5/13 3:39 PM, Vladimir Ivanov wrote:
>>>>>>>> http://cr.openjdk.java.net/~vlivanov/8023037/webrev.00/
>>>>>>>>
>>>>>>>> There's a race between compiler thread during method registration
>>>>>>>> and
>>>>>>>> sweeper: sweeper can invalidate a nmethod which hasn't been
>>>>>>>> installed
>>>>>>>> yet.
>>>>>>>>
>>>>>>>> Consider the following scenario:
>>>>>>>>    ciEnv::register_method:
>>>>>>>>      - new nmethod(...)
>>>>>>>>
>>>>>>>>    sweeper:
>>>>>>>>      - invalidates newly allocated nmethod and patches VEP to call
>>>>>>>> handle_wrong_method
>>>>>>>>      - tries to unlink it, but method()->from_compiled_entry() !=
>>>>>>>> verified_entry_point() since nmethod hasn't been installed yet
>>>>>>>>
>>>>>>>>    ciEnv::register_method:
>>>>>>>>      - installs already invalidated nmethod
>>>>>>>>
>>>>>>>> Calling corresponding Java method will lead to infinite loop:
>>>>>>>> VEP of
>>>>>>>> the
>>>>>>>> compiled method calls handle_wrong_method and call site resolution
>>>>>>>> returns the very same compiled method.
>>>>>>>>
>>>>>>>> The fix is to grab a lock after nmethod is allocated in the code
>>>>>>>> cache
>>>>>>>> and check that it hasn't been already invalidated by the sweeper
>>>>>>>> before
>>>>>>>> proceeding. Sweeper already synchronizes on a nmethod before
>>>>>>>> invalidating it.
>>>>>>>>
>>>>>>>> Testing: failing test w/ diagnostic output.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Vladimir Ivanov


More information about the hotspot-compiler-dev mailing list