request for review (m): 4957990: Perm heap bloat in JVM

Wed Aug 5 19:40:34 PDT 2009

Hi Tom -- Thanks for the explanation. I looked at the code some
more (since the problem is difficult to reproduce at will).

On 08/05/09 18:15, Tom Rodriguez wrote:
> My understanding was that OSR nmethods could/should only be reclaimed 
> once the klass of their holder was unloaded.  The reason for this is 
> that the normal not_entrant sweeping logic isn't implemented safely for 
> OSR methods so the sweeper can't safely reclaim them.  If the klass 
> itself is being reclaimed then it should be safe to reclaim the OSR 
> nmethods.  It's possible that we have a hole in the logic which doesn't 
> account for dealing with unloaded OSR methods properly and you changes 
> just make it more likely to happen.  We either need to close the 
> not_entrant hole for OSR nmethods, which isn't that hard, and then 
> correct nmethod::make_unloaded to unlink the OSR nmethod or we have to 
> make do_unloading disallow unloading for OSR nmethods is the methodOop 
> is strongly reachable.

Well, here's how the code currently looks to me:

... roughly speaking, if an nmethod (in the weak roots scan)
     is found to contain an unmarked oop, it seems to become
     immediately eligible for unloading
     (the reachability of its _method notwithstanding):-

// This is called at the end of the strong tracing/marking phase of a
// GC to unload an nmethod if it contains otherwise unreachable
// oops.

void nmethod::do_unloading(BoolObjectClosure* is_alive,
                            OopClosure* keep_alive, bool unloading_occurred) {
   // Make sure the oop's ready to receive visitors
   assert(!is_zombie() && !is_unloaded(),
          "should not call follow on zombie or unloaded nmethod");

...
   // Follow methodOop
   if (can_unload(is_alive, keep_alive, (oop*)&_method, unloading_occurred)) {
     return;
   }
...
   // Compiled code
   RelocIterator iter(this, low_boundary);
   while (iter.next()) {
     if (iter.type() == relocInfo::oop_type) {
       oop_Relocation* r = iter.oop_reloc();
       // In this loop, we must only traverse those oops directly embedded in
       // the code.  Other oops (oop_index>0) are seen as part of scopes_oops.
       assert(1 == (r->oop_is_immediate()) +
                   (r->oop_addr() >= oops_begin() && r->oop_addr() < oops_end()),
              "oop must be found in exactly one place");
       if (r->oop_is_immediate() && r->oop_value() != NULL) {
         if (can_unload(is_alive, keep_alive, r->oop_addr(), unloading_occurred)) {
           return;
         }
       }
...
   // Scopes
   for (oop* p = oops_begin(); p < oops_end(); p++) {
     if (*p == Universe::non_oop_word())  continue;  // skip non-oops
     if (can_unload(is_alive, keep_alive, p, unloading_occurred)) {
       return;
     }
   }
...
}

Then:-
// If this oop is not live, the nmethod can be unloaded.
bool nmethod::can_unload(BoolObjectClosure* is_alive,
                          OopClosure* keep_alive,
                          oop* root, bool unloading_occurred) {
   assert(root != NULL, "just checking");
   oop obj = *root;
   if (obj == NULL || is_alive->do_object_b(obj)) {
       return false;
   }
   if (obj->is_compiledICHolder()) {
     compiledICHolderOop cichk_oop = compiledICHolderOop(obj);
     if (is_alive->do_object_b(
           cichk_oop->holder_method()->method_holder()) &&
         is_alive->do_object_b(cichk_oop->holder_klass())) {
       // The oop should be kept alive
       keep_alive->do_oop(root);
       return false;
     }
   }
   assert(unloading_occurred, "Inconsistency in unloading");
   make_unloaded(is_alive, obj);
   return true;
}

Here we seem to have ended up with a scope oop that is not a CICHolder,
so we call make_unloaded on the nmethod:-

     nmethod::CodeBlob::_name                  = 0xfed2f3e4 "nmethod"
     nmethod::CodeBlob::_size                  = 34300
     nmethod::CodeBlob::_header_size           = 184
     nmethod::CodeBlob::_relocation_size       = 2544
     nmethod::CodeBlob::_instructions_offset   = 2744
     nmethod::CodeBlob::_frame_complete_offset = 16
     nmethod::CodeBlob::_data_offset           = 18188
     nmethod::CodeBlob::_oops_offset           = 33896
     nmethod::CodeBlob::_oops_length           = 101
     nmethod::CodeBlob::_frame_size            = 48
     nmethod::CodeBlob::_oop_maps              = 0x9f3188
     nmethod::CodeBlob::_comments              = {
         nmethod::CodeBlob::CodeComments::_comments = (nil)
     }
     nmethod::_method                                                  = 0xf5df2240
     nmethod::_entry_bci                                               = 221
     nmethod::_link                                                    = (nil)
     nmethod::_compiler                                                = 0x105030
     nmethod::_exception_offset                                        = 18160
     nmethod::_deoptimize_offset                                       = 18172
     nmethod::_trap_offset                                             = 0
     nmethod::_stub_offset                                             = 17240
     nmethod::_consts_offset                                           = 18188
     nmethod::_scopes_data_offset                                      = 18188
     nmethod::_scopes_pcs_offset                                       = 24836
     nmethod::_dependencies_offset                                     = 31724
     nmethod::_handler_table_offset                                    = 31748
     nmethod::_nul_chk_table_offset                                    = 33668
     nmethod::_nmethod_end_offset                                      = 33896
     nmethod::_orig_pc_offset                                          = 188
     nmethod::_compile_id                                              = 3
     nmethod::_comp_level                                              = 2
     nmethod::_entry_point                                             = 0xfb93c740 "\x91\xd0 ^P^G?\xff\xe0\xc0#\x80^C\x9d\xe3\xbf@\xea^F ^H\xec^F ^L\x90^P"
     nmethod::_verified_entry_point                                    = 0xfb93c740 "\x91\xd0 ^P^G?\xff\xe0\xc0#\x80^C\x9d\xe3\xbf@\xea^F ^H\xec^F ^L\x90^P"
     nmethod::_osr_entry_point                                         = 0xfb93c744 "^G?\xff\xe0\xc0#\x80^C\x9d\xe3\xbf@\xea^F ^H\xec^F ^L\x90^P"
     nmethod::flags                                                    = {
         nmethod::nmFlags::version                = 0
         nmethod::nmFlags::level                  = 0
         nmethod::nmFlags::age                    = 0
         nmethod::nmFlags::state                  = 0
         nmethod::nmFlags::isUncommonRecompiled   = 0
         nmethod::nmFlags::isToBeRecompiled       = 0
         nmethod::nmFlags::hasFlushedDependencies = 0
         nmethod::nmFlags::markedForReclamation   = 0
         nmethod::nmFlags::has_unsafe_access      = 0
     }
     nmethod::_markedForDeoptimization                                 = false
     nmethod::_unload_reported                                         = false
     nmethod::_has_debug_info                                          = false
     nmethod::_lock_count                                              = 0
     nmethod::_stack_traversal_mark                                    = 0
     nmethod::_exception_cache                                         = (nil)
     nmethod::_pc_desc_cache                                           = {
         nmethod::PcDescCache::_last_pc_desc = 0xfb943268
         nmethod::PcDescCache::_pc_descs     = (0xfb942e00, 0xfb942cd4, 0xfb942cb0, 0xfb942c74)
     }
     nmethod::_compiled_synchronized_native_basic_lock_owner_sp_offset = {
         nmethod::ByteSize::_size = -1
     }
     nmethod::_compiled_synchronized_native_basic_lock_sp_offset       = {
         nmethod::ByteSize::_size = -1
     }
     nmethod::_zombie_instruction_size                                 = 12

We then go flush_dependencies as part of unloading it, but
this does not seem to involve removing it from the osr_nmethod_head.

... then, once it's eligible for unloading, it's marked
     "unloaded", and the sweeper summarily flushes it without
     removing it from the osr_method_head:-

...
   } else if (nm->is_unloaded()) {
     // Unloaded code, just make it a zombie
     if (PrintMethodFlushing && Verbose)
       tty->print_cr("### Nmethod 0x%x (unloaded) being made zombie", nm);
     if (nm->is_osr_only_method()) {
       // No inline caches will ever point to osr methods, so we can just remove it
       nm->flush();
     } else {
       nm->make_zombie();
       _rescan = true;
     }
...

Do you believe the hole here may be in the way the
nmethod is being unloaded (perhaps flush_dependencies
or related should unlink the nmethod from the osr method list)
or that the criteria for unloading are here too weak and
allowing a premature unload?

thanks for all your help so far, and talk to you tomorrow.
-- ramki

PS: Given the nitty-gritty of this exchange, i am tempted to pull
this from the O.J.N. list and take it off-line 1-on-1 with Tom.
Let me know if you have read this far and want to be included in
the conversation or want the lists to continue being copied.

> 
> tom
> 
> On Aug 5, 2009, at 4:57 PM, Y.S.Ramakrishna at Sun.COM wrote:
> 
>>
>> Can there be a situation where an osr nmethod associated
>> with a method oop gets unloaded while the method and its
>> class are still alive? I see that happening with my workspace
>> with the changes for 4957990 (yet to figure out why the osr
>> nmethod was unloaded).
>>
>> Shortly after said unload, the sweeper flushes the nmethod,
>> but the nmethod is still left linked off of the klass's _osr_nmethod_head
>> list, and a subsequent invocation counter overflow at a bci does
>> an osr nmethod lookup and falls foul of the flushed nmethod left
>> in the instanceKlass's osr nmethod head.
>>
>> So I have two questions:
>>
>> (a) can an osr nmethod be unloaded while its klass is still alive ?
>> (b) if the answer to (a) is yes, then we need to take special steps
>>    (not present in current code) to unlink said nmethod from the
>>    list of osr nmethods in its klass.
>>
>> Am I making sense? Otherwise i can provide more direct detail.
>> (I am still digging to find out why we decided to unload the
>> nmethod and will have more follow-up info in a subsequent email.)
>>
>> thanks.
>> -- ramki
>