[9] RFR(S): 8058737: CodeCache::find_blob fails with 'unsafe access to zombie method'

Thu Sep 25 16:15:24 UTC 2014

Thank you for explanation, now I got it.
I am not sure that we should stop cleaning call sites even if we did not see nmethod on call stacks.
Your last webrev.01 solution is more robust, I think.

Thanks,
Vladimir

On 9/25/14 5:22 AM, Tobias Hartmann wrote:
> Hi Vladimir,
>
> thanks for the feedback!
>
> On 24.09.2014 19:13, Vladimir Kozlov wrote:
>> On 9/24/14 5:24 AM, Tobias Hartmann wrote:
>>> Hi Vladimir,
>>>
>>> thanks for the review.
>>>
>>> On 23.09.2014 23:05, Vladimir Kozlov wrote:
>>>> Why it is on confidential list?
>>>
>>> I originally set the bug to confidential because some of the failing tests are closed. Should I change it to open?
>>
>> Changes are in open code and they should be discussed in open. I changed mailing list.
>
> Okay, thanks, I changed the bug to open.
>
>> About ICStub.
>> new_ic_stub() creates empty ICStub, the initialized with IC data is previous stub:
>> ic_stub->set_stub(ic, cached_value, entry);
>>
>> So when first safepoint happens interesting ICStub is not empty. My question is why it is not cleared during first
>> safepoint? Is nmethod::cleanup_inline_caches() called during first safepoint again or not? I mean from
>> CodeCache::gc_epilogue(). If it is not called again then it explains the situation.
>
> The first safepoint does indeed clear all ICStubs but the point is that after the safepoint finishes we are still in
> 'nmethod::cleanup_inline_caches()' of the same nmethod. Because the nmethod has many ICs, we continue to call
> 'CompiledIC::set_to_clean()' that will then create new ICStubs.
>
> To answer your question: The first safepoint does not clear the problematic ICStub because the stub is created right
> after the safepoint returns.
>
> The sweeper then changes the state of the nmethod to zombie (because the first safepoint did not find it on the stack).
> At the second safepoint the assert is hit.
>
> Another solution would be to bail out of the IC cleanup if the nmethod may now be converted to zombie:
>
> --- a/src/share/vm/code/nmethod.cpp    Sun Sep 21 16:13:39 2014 +0200
> +++ b/src/share/vm/code/nmethod.cpp    Thu Sep 25 14:11:33 2014 +0200
> @@ -1152,6 +1152,9 @@
>     ResourceMark rm;
>     RelocIterator iter(this, low_boundary);
>     while(iter.next()) {
> +   if (is_not_entrant() && can_not_entrant_be_converted()) {
> +     break;
> +   }
>       switch(iter.type()) {
>         case relocInfo::virtual_call_type:
>         case relocInfo::opt_virtual_call_type: {
>
> What do you think?
>
> Thanks,
> Tobias
>
>
>> Thanks,
>> Vladimir
>>
>>>
>>>> On 9/23/14 6:37 AM, Tobias Hartmann wrote:
>>>>> Igor, Vladimir, thanks for the reviews!
>>>>>
>>>>> Please see comments inline.
>>>>>
>>>>> On 22.09.2014 23:47, Igor Veresov wrote:
>>>>>> Could in theory the blob be deallocated by the time we look at it in
>>>>>> ICStub::finalize?
>>>>>
>>>>> Yes, that could happen. As Vladimir mentioned, more than one traversal
>>>>> of the sweeper may happen, changing the state of the nmethod from zombie
>>>>> to unloaded and deallocating it.
>>>>
>>>> Tobias, just to be clear. The state 'zombie' does not converted to state 'unloaded'. The 'unloaded' state is set only
>>>> in nmethod::do_unloading() when we unload classes. And 'unloaded' state is converted to 'zombie'. This is the only
>>>> direction of a state change.
>>>
>>> Yes, thanks. I meant to say that the zombie method may be unloaded during another traversal of the sweeper.
>>>
>>>> On 23.09.2014 03:12, Vladimir Kozlov wrote:
>>>>>> I think safer solution will be to cleanup corresponding ICStub when
>>>>>> nmethod is converted to zombie. Otherwise you will get incorrect
>>>>>> result as Igor pointed when nmethod is removed - we can get several
>>>>>> sweeps before second 6) sefapoint.
>>>>>
>>>>> Yes, I agree. I changed the code to clear ICStubs when the nmethod is
>>>>> converted to zombie. It should be sufficient to do this for the
>>>>> non-entrant->zombie transition in the sweeper. The other calls to
>>>>> 'nmethod::make_zombie' either happen at a safepoint (see
>>>>> 'CodeCache::make_marked_nmethods_zombies') or for the unloaded->zombie
>>>>> transition in the sweeper that should be safe.
>>>>>
>>>>> New webrev: http://cr.openjdk.java.net/~thartmann/8058737/webrev.01/
>>>>
>>>> One thing is still not clear for me. Please, explain why during the first safepoint which should clean StubQueue that
>>>> ICStub was not cleaned? We at safepoint and it is safe to cleanr it:
>>>>
>>>> nmethod::cleanup_inline_caches() {
>>>> ...
>>>>           if (!nm->is_in_use() || (nm->method()->code() != nm)) ic->set_to_clean();
>>>>
>>>> CompiledIC::set_to_clean() {
>>>> ...
>>>>   if (safe_transition) {
>>>>     // Kill any leftover stub we might have too
>>>>     if (is_in_transition_state()) {
>>>>       ICStub* old_stub = ICStub_from_destination_address(stub_address());
>>>>       old_stub->clear();
>>>>     }
>>>
>>> The problem is that the first safepoint happens while processing a non-entrant nmethod in the sweeper: The sweeper tries
>>> to cleanup the inline caches of this nmethod but fails while creating the transition stub because there is no space in
>>> the StubQueue. A safepoint is forced to free up space by back-patching and removing ICStubs.
>>>
>>> VM_ForceSafepoint vfs;
>>> InlineCacheBuffer::new_ic_stub()
>>> InlineCacheBuffer::create_transition_stub()
>>> CompiledIC::set_to_clean()
>>> nmethod::cleanup_inline_caches()
>>> NMethodSweeper::process_nmethod() line 557
>>>
>>> That means right after the safepoint completes, a new ICStub is created and the sweeper continues. As a side effect, the
>>> safe point also performs stack scanning and the nmethod is not found on the stack. During the next iteration of the
>>> sweeper the nmethod is therefore converted to zombie. Note that there is no safepoint between the actual creation of the
>>> ICStub and the conversion to zombie!
>>>
>>> At the next safepoint, back-patching of ICStubs is performed again and fails because the nmethod is in zombie state:
>>>
>>> CodeCache::find_nmethod(void*)
>>> ICStub::finalize()
>>> StubQueue::remove_all()
>>> InlineCacheBuffer::update_inline_caches()
>>> SafepointSynchronize::do_cleanup_tasks()
>>>
>>> Does that answer your question?
>>>
>>> Thanks,
>>> Tobias
>>>
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>> Thanks,
>>>>> Tobias
>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 9/22/14 6:08 AM, Tobias Hartmann wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> please review the following patch.
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8058737
>>>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8058737/webrev.00/
>>>>>>>
>>>>>>> == Problem ==
>>>>>>> The test closed/compiler/6440479/VirtualDispatch.java contains a method
>>>>>>> 'test' with many virtual calls that change the target during execution.
>>>>>>> Compilation of this method results in a high number of inline caches
>>>>>>> being created for the virtual calls.
>>>>>>>
>>>>>>> List of events that cause the bug:
>>>>>>> 1) nmethod [1] is converted to non-entrant.
>>>>>>> 2) The nmethod is processed by the sweeper. It is still active on the
>>>>>>> stack and can therefore not yet be converted to zombie. However, the
>>>>>>> sweeper cleans the inline caches of [1] (see line 553 of
>>>>>>> 'NMethodSweeper::process_nmethod'). CompiledIC::set_to_clean() creates a
>>>>>>> transition stub because a mt safe transition to the clean state is not
>>>>>>> possible (see 'InlineCacheBuffer::create_transition_stub'). Because the
>>>>>>> StubQueue is full, 'InlineCacheBuffer::new_ic_stub' forces a safepoint
>>>>>>> to back patch IC stubs and free space in the StubQueue.
>>>>>>> 3) The safepoint also performs stack scanning and [1] is not found on
>>>>>>> the stack.
>>>>>>> 4) 'nmethod::cleanup_inline_caches' continues and finishes.
>>>>>>> 5) The next traversal of the sweeper converts [1] to zombie (it was not
>>>>>>> found on the stack during the forced safepoint).
>>>>>>> 6) A safepoint occurs updating the inline caches of [1].
>>>>>>> 'ICStub::finalize' tries to patch the IC but fails because the nmethod
>>>>>>> is a zombie (guarantee in 'CodeCache::find_blob'). See [2].
>>>>>>>
>>>>>>> Summary:
>>>>>>> Usually, zombie nmethods do _not_ have an IC stub because the transition
>>>>>>> from non-entrant to zombie only happens after a safepoint that removes
>>>>>>> all IC stubs by back patching the corresponding ICs. In this case, a
>>>>>>> safepoint is triggered _while_ creating an IC stub and this safepoint
>>>>>>> also performs stack scanning allowing the nmethod to transition to
>>>>>>> zombie while having an IC stub.
>>>>>>>
>>>>>>> == Solution ==
>>>>>>> Use CodeCache::find_blob_unsafe in ICStub::finalize and only back patch
>>>>>>> the IC if the nmethod is not a zombie.
>>>>>>>
>>>>>>> Maybe we should also think about increasing the size of the StubQueue.
>>>>>>> Currently it is set to 10*K (see 'InlineCacheBuffer::initialize').
>>>>>>>
>>>>>>> == Testing ==
>>>>>>> - Failing test
>>>>>>> - JPRT
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tobias
>>>>>>>
>>>>>>> [1] 'VirtualDispatch::test'
>>>>>>> [2] Stack trace:
>>>>>>> CodeCache::find_nmethod(void*)
>>>>>>> ICStub::finalize()
>>>>>>> StubQueue::remove_all()
>>>>>>> InlineCacheBuffer::update_inline_caches()
>>>>>>> SafepointSynchronize::do_cleanup_tasks()
>>>>>>>
>>>>>
>>>
>