[9] RFR(S): 8058737: CodeCache::find_blob fails with 'unsafe access to zombie method'

Fri Sep 26 14:26:02 UTC 2014

Hi Vladimir,

On 25.09.2014 18:15, Vladimir Kozlov wrote:
> Thank you for explanation, now I got it.
> I am not sure that we should stop cleaning call sites even if we did 
> not see nmethod on call stacks.
> Your last webrev.01 solution is more robust, I think.

Yes, I think so too.

Best,
Tobias

>
> Thanks,
> Vladimir
>
> On 9/25/14 5:22 AM, Tobias Hartmann wrote:
>> Hi Vladimir,
>>
>> thanks for the feedback!
>>
>> On 24.09.2014 19:13, Vladimir Kozlov wrote:
>>> On 9/24/14 5:24 AM, Tobias Hartmann wrote:
>>>> Hi Vladimir,
>>>>
>>>> thanks for the review.
>>>>
>>>> On 23.09.2014 23:05, Vladimir Kozlov wrote:
>>>>> Why it is on confidential list?
>>>>
>>>> I originally set the bug to confidential because some of the 
>>>> failing tests are closed. Should I change it to open?
>>>
>>> Changes are in open code and they should be discussed in open. I 
>>> changed mailing list.
>>
>> Okay, thanks, I changed the bug to open.
>>
>>> About ICStub.
>>> new_ic_stub() creates empty ICStub, the initialized with IC data is 
>>> previous stub:
>>> ic_stub->set_stub(ic, cached_value, entry);
>>>
>>> So when first safepoint happens interesting ICStub is not empty. My 
>>> question is why it is not cleared during first
>>> safepoint? Is nmethod::cleanup_inline_caches() called during first 
>>> safepoint again or not? I mean from
>>> CodeCache::gc_epilogue(). If it is not called again then it explains 
>>> the situation.
>>
>> The first safepoint does indeed clear all ICStubs but the point is 
>> that after the safepoint finishes we are still in
>> 'nmethod::cleanup_inline_caches()' of the same nmethod. Because the 
>> nmethod has many ICs, we continue to call
>> 'CompiledIC::set_to_clean()' that will then create new ICStubs.
>>
>> To answer your question: The first safepoint does not clear the 
>> problematic ICStub because the stub is created right
>> after the safepoint returns.
>>
>> The sweeper then changes the state of the nmethod to zombie (because 
>> the first safepoint did not find it on the stack).
>> At the second safepoint the assert is hit.
>>
>> Another solution would be to bail out of the IC cleanup if the 
>> nmethod may now be converted to zombie:
>>
>> --- a/src/share/vm/code/nmethod.cpp    Sun Sep 21 16:13:39 2014 +0200
>> +++ b/src/share/vm/code/nmethod.cpp    Thu Sep 25 14:11:33 2014 +0200
>> @@ -1152,6 +1152,9 @@
>>     ResourceMark rm;
>>     RelocIterator iter(this, low_boundary);
>>     while(iter.next()) {
>> +   if (is_not_entrant() && can_not_entrant_be_converted()) {
>> +     break;
>> +   }
>>       switch(iter.type()) {
>>         case relocInfo::virtual_call_type:
>>         case relocInfo::opt_virtual_call_type: {
>>
>> What do you think?
>>
>> Thanks,
>> Tobias
>>
>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>>> On 9/23/14 6:37 AM, Tobias Hartmann wrote:
>>>>>> Igor, Vladimir, thanks for the reviews!
>>>>>>
>>>>>> Please see comments inline.
>>>>>>
>>>>>> On 22.09.2014 23:47, Igor Veresov wrote:
>>>>>>> Could in theory the blob be deallocated by the time we look at 
>>>>>>> it in
>>>>>>> ICStub::finalize?
>>>>>>
>>>>>> Yes, that could happen. As Vladimir mentioned, more than one 
>>>>>> traversal
>>>>>> of the sweeper may happen, changing the state of the nmethod from 
>>>>>> zombie
>>>>>> to unloaded and deallocating it.
>>>>>
>>>>> Tobias, just to be clear. The state 'zombie' does not converted to 
>>>>> state 'unloaded'. The 'unloaded' state is set only
>>>>> in nmethod::do_unloading() when we unload classes. And 'unloaded' 
>>>>> state is converted to 'zombie'. This is the only
>>>>> direction of a state change.
>>>>
>>>> Yes, thanks. I meant to say that the zombie method may be unloaded 
>>>> during another traversal of the sweeper.
>>>>
>>>>> On 23.09.2014 03:12, Vladimir Kozlov wrote:
>>>>>>> I think safer solution will be to cleanup corresponding ICStub when
>>>>>>> nmethod is converted to zombie. Otherwise you will get incorrect
>>>>>>> result as Igor pointed when nmethod is removed - we can get several
>>>>>>> sweeps before second 6) sefapoint.
>>>>>>
>>>>>> Yes, I agree. I changed the code to clear ICStubs when the 
>>>>>> nmethod is
>>>>>> converted to zombie. It should be sufficient to do this for the
>>>>>> non-entrant->zombie transition in the sweeper. The other calls to
>>>>>> 'nmethod::make_zombie' either happen at a safepoint (see
>>>>>> 'CodeCache::make_marked_nmethods_zombies') or for the 
>>>>>> unloaded->zombie
>>>>>> transition in the sweeper that should be safe.
>>>>>>
>>>>>> New webrev: http://cr.openjdk.java.net/~thartmann/8058737/webrev.01/
>>>>>
>>>>> One thing is still not clear for me. Please, explain why during 
>>>>> the first safepoint which should clean StubQueue that
>>>>> ICStub was not cleaned? We at safepoint and it is safe to cleanr it:
>>>>>
>>>>> nmethod::cleanup_inline_caches() {
>>>>> ...
>>>>>           if (!nm->is_in_use() || (nm->method()->code() != nm)) 
>>>>> ic->set_to_clean();
>>>>>
>>>>> CompiledIC::set_to_clean() {
>>>>> ...
>>>>>   if (safe_transition) {
>>>>>     // Kill any leftover stub we might have too
>>>>>     if (is_in_transition_state()) {
>>>>>       ICStub* old_stub = 
>>>>> ICStub_from_destination_address(stub_address());
>>>>>       old_stub->clear();
>>>>>     }
>>>>
>>>> The problem is that the first safepoint happens while processing a 
>>>> non-entrant nmethod in the sweeper: The sweeper tries
>>>> to cleanup the inline caches of this nmethod but fails while 
>>>> creating the transition stub because there is no space in
>>>> the StubQueue. A safepoint is forced to free up space by 
>>>> back-patching and removing ICStubs.
>>>>
>>>> VM_ForceSafepoint vfs;
>>>> InlineCacheBuffer::new_ic_stub()
>>>> InlineCacheBuffer::create_transition_stub()
>>>> CompiledIC::set_to_clean()
>>>> nmethod::cleanup_inline_caches()
>>>> NMethodSweeper::process_nmethod() line 557
>>>>
>>>> That means right after the safepoint completes, a new ICStub is 
>>>> created and the sweeper continues. As a side effect, the
>>>> safe point also performs stack scanning and the nmethod is not 
>>>> found on the stack. During the next iteration of the
>>>> sweeper the nmethod is therefore converted to zombie. Note that 
>>>> there is no safepoint between the actual creation of the
>>>> ICStub and the conversion to zombie!
>>>>
>>>> At the next safepoint, back-patching of ICStubs is performed again 
>>>> and fails because the nmethod is in zombie state:
>>>>
>>>> CodeCache::find_nmethod(void*)
>>>> ICStub::finalize()
>>>> StubQueue::remove_all()
>>>> InlineCacheBuffer::update_inline_caches()
>>>> SafepointSynchronize::do_cleanup_tasks()
>>>>
>>>> Does that answer your question?
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Tobias
>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 9/22/14 6:08 AM, Tobias Hartmann wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> please review the following patch.
>>>>>>>>
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8058737
>>>>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8058737/webrev.00/
>>>>>>>>
>>>>>>>> == Problem ==
>>>>>>>> The test closed/compiler/6440479/VirtualDispatch.java contains 
>>>>>>>> a method
>>>>>>>> 'test' with many virtual calls that change the target during 
>>>>>>>> execution.
>>>>>>>> Compilation of this method results in a high number of inline 
>>>>>>>> caches
>>>>>>>> being created for the virtual calls.
>>>>>>>>
>>>>>>>> List of events that cause the bug:
>>>>>>>> 1) nmethod [1] is converted to non-entrant.
>>>>>>>> 2) The nmethod is processed by the sweeper. It is still active 
>>>>>>>> on the
>>>>>>>> stack and can therefore not yet be converted to zombie. 
>>>>>>>> However, the
>>>>>>>> sweeper cleans the inline caches of [1] (see line 553 of
>>>>>>>> 'NMethodSweeper::process_nmethod'). CompiledIC::set_to_clean() 
>>>>>>>> creates a
>>>>>>>> transition stub because a mt safe transition to the clean state 
>>>>>>>> is not
>>>>>>>> possible (see 'InlineCacheBuffer::create_transition_stub'). 
>>>>>>>> Because the
>>>>>>>> StubQueue is full, 'InlineCacheBuffer::new_ic_stub' forces a 
>>>>>>>> safepoint
>>>>>>>> to back patch IC stubs and free space in the StubQueue.
>>>>>>>> 3) The safepoint also performs stack scanning and [1] is not 
>>>>>>>> found on
>>>>>>>> the stack.
>>>>>>>> 4) 'nmethod::cleanup_inline_caches' continues and finishes.
>>>>>>>> 5) The next traversal of the sweeper converts [1] to zombie (it 
>>>>>>>> was not
>>>>>>>> found on the stack during the forced safepoint).
>>>>>>>> 6) A safepoint occurs updating the inline caches of [1].
>>>>>>>> 'ICStub::finalize' tries to patch the IC but fails because the 
>>>>>>>> nmethod
>>>>>>>> is a zombie (guarantee in 'CodeCache::find_blob'). See [2].
>>>>>>>>
>>>>>>>> Summary:
>>>>>>>> Usually, zombie nmethods do _not_ have an IC stub because the 
>>>>>>>> transition
>>>>>>>> from non-entrant to zombie only happens after a safepoint that 
>>>>>>>> removes
>>>>>>>> all IC stubs by back patching the corresponding ICs. In this 
>>>>>>>> case, a
>>>>>>>> safepoint is triggered _while_ creating an IC stub and this 
>>>>>>>> safepoint
>>>>>>>> also performs stack scanning allowing the nmethod to transition to
>>>>>>>> zombie while having an IC stub.
>>>>>>>>
>>>>>>>> == Solution ==
>>>>>>>> Use CodeCache::find_blob_unsafe in ICStub::finalize and only 
>>>>>>>> back patch
>>>>>>>> the IC if the nmethod is not a zombie.
>>>>>>>>
>>>>>>>> Maybe we should also think about increasing the size of the 
>>>>>>>> StubQueue.
>>>>>>>> Currently it is set to 10*K (see 'InlineCacheBuffer::initialize').
>>>>>>>>
>>>>>>>> == Testing ==
>>>>>>>> - Failing test
>>>>>>>> - JPRT
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Tobias
>>>>>>>>
>>>>>>>> [1] 'VirtualDispatch::test'
>>>>>>>> [2] Stack trace:
>>>>>>>> CodeCache::find_nmethod(void*)
>>>>>>>> ICStub::finalize()
>>>>>>>> StubQueue::remove_all()
>>>>>>>> InlineCacheBuffer::update_inline_caches()
>>>>>>>> SafepointSynchronize::do_cleanup_tasks()
>>>>>>>>
>>>>>>
>>>>
>>