[9] RFR(S): 8075805: Crash while trying to release CompiledICHolder

Tue Aug 25 10:34:30 UTC 2015

Hi all,

we appreciate that this code gets cleaned up.

Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration?
We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie:

  if (state == zombie) {
    MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock);
    address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches.
    RelocIterator iter(this, low_boundary);
    while (iter.next()) {
      if (iter.type() == relocInfo::virtual_call_type) {
        CompiledIC *ic = CompiledIC_at(&iter);
        ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL);
      }
    }
  }

(Note: set_ic_destination_and_value is currently private.)

As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration.
Not sure which approach is the better one.

Best regards,
Martin

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann
Sent: Dienstag, 25. August 2015 07:43
To: Igor Veresov
Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net
Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder

On 24.08.2015 22:10, Igor Veresov wrote:
> Seems good to me.

Thanks, Igor.

> Btw, did you find why there is a need for “marked for reclamation” state?

No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition.

Best,
Tobias 

> 
> igor
> 
>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
>>
>> Thanks, Vladimir! Please see comments inline.
>>
>> On 21.08.2015 19:28, Vladimir Kozlov wrote:
>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need  it?
>>
>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between.
>>
>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods.
>>
>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie."
>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded.
>>
>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods.
>>
>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie.
>>
>>> What spacing you changed in compiledIC.cpp because webrev does not show them?
>>
>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch":
>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch
>>
>>> Please, fix comment in vm_operations.cpp
>>>
>>>   // Make the dependent methods zombies
>>> -  CodeCache::make_marked_nmethods_zombies();
>>> +  CodeCache::make_marked_nmethods_not_entrant();
>>
>> Fixed.
>>
>> New webrev:
>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/
>>
>> Thanks,
>> Tobias
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote:
>>>> Hi,
>>>>
>>>> please review the following patch.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8075805
>>>> http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/
>>>>
>>>> Problem:
>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused.
>>>>
>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed.
>>>>
>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper.
>>>>
>>>>     state of A       state of B
>>>> -----------------------------------------
>>>>     non-entrant      non-entrant
>>>> S   [not on stack]   [not on stack]
>>>> S   zombie           zombie
>>>> S   marked           marked
>>>> S   flushed          flushed/re-allocated
>>>>
>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie.
>>>>
>>>> Let's look at the following setting:
>>>>
>>>>     state of A       state of B
>>>> -----------------------------------------
>>>>     non-entrant
>>>> S   [not on stack]
>>>>                     non-entrant
>>>> S   zombie          [not on stack]
>>>>                     zombie
>>>> S   marked          marked
>>>> S   flushed         flushed/re-allocated
>>>>
>>>> There are two problems here:
>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself,
>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack.
>>>>
>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1].
>>>>
>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie:
>>>>
>>>>     state of A       state of B
>>>> -----------------------------------------
>>>>     unloaded         unloaded
>>>> S   zombie           zombie
>>>> S   marked           marked
>>>> S   flushed          flushed/re-allocated
>>>>
>>>> Again, we crash while flushing A.
>>>>
>>>> Solution:
>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie.
>>>>
>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well.
>>>>
>>>> Testing:
>>>> - Executed failing tests for a week (still running)
>>>> - JPRT
>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>>
>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552):
>>>>
>>>>   Inline cache at 0xffff80ffad89b017, calling  0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18
>>>> IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap':
>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/
>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie
>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies()
>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation
>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation
>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed
>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb
>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated
>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed
>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017
>>>> enqueueing icholder 0x0000000800034a18 to be freed
>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb
>>>> deleting icholder 0x0000000800034a18
>>>> ## nof_mallocs = 211209, nof_frees = 105760
>>>> ## memory stomp:
>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18
>>>>   Header guard @0x00000008000349f8 is BROKEN
>>>>
>