Preliminary review for 8028107: Kitchensink crashed with EAV

Thu Nov 28 01:39:32 PST 2013

Vladimir,

On Wednesday 27 November 2013 18.49.14 Vladimir Kozlov wrote:
> Additional note. The code does "lock" nmethod (which is actually the field
> nm:_lock_count increment):
> 
>    nmethod* callee_nm = callee_method->code();
>    nmethodLocker nl_callee(callee_nm);
> 
> but unloading code does not check it (is_locked_by_vm()) because it is too
> late anyway.
> 
> John Rose suggested to modify nmethodLocker to put related metadata (method,
> klass) or JavaMirror into thread's handles area (or other reachable from GC
> place) so that such nmethods will be reachable during GC and be strong
> roots. It will prevent GC modify nmethod state when a java thread works
> with nmethod.

I read through your original explanation below and from a GC perspective your 
explanation seems plausible. I don't really know how compiledICs work, but it 
is possible that an nmethod can be unloaded at a safepoint if any of the 
objects it references are not live. I've had problems with this when working 
on class unloading for G1 previously.

I don't think it would be enough to create handles for the nmethod's metadata 
since the GC does not actually follow any pointers to find nmethods.
As far as I understand from the sources nmethod unloading is _forced_ if any 
java object which is embedded in an nmethod is dead. 
If you want to make sure that the nmethod is not unloaded then you would need 
to register your nmethod to a global list and change all the gc code to call 
oops_do on the nmethods on this list with the correct closures, basically some 
sort of nmethod handle stack. I'd prefer to avoid this approach since it will 
add more complexity to GC root scanning and would further complicate class 
unloading for G1.

I would also like to add that since this nmethod is the only referent of the 
java object that caused the unloading it is highly likely that it will be 
unloaded at the next full gc unless it's on a thread stack. Keeping the 
nmethod alive here seems to only delay the inevitable.

/Mikael

> 
> Thanks,
> Vladimir
> 
> On 11/27/13 5:51 PM, Vladimir Kozlov wrote:
> > https://bugs.openjdk.java.net/browse/JDK-8028107
> > 
> > It is very rare case. Last time it was 1 year ago:
> > 
> > https://bugs.openjdk.java.net/browse/JDK-7199505
> > 
> > nmethod unloading (nmethod::make_unloaded()) happens only during safepoint
> > when it contains pointers (embedded or in oop map, relocation info) to
> > dead java objects (see nmethod::can_unload()).
> > 
> > It can only happens when a nmethod is NOT on call stack. GCs have path
> > over nmethods on stack and they are treated as strong roots. Igor and I
> > looked on GC code and verified it.
> > 
> > I instrumented nmethod::make_unloaded() to see if we may unloading
> > nmethods on stack and ran kitchensink for few day. No failures.
> > 
> > The only place left is compiled inline caches. We DO clean up them at the 
end of safepoint if unloading happens:
> >    if (is_in_use()) {
> >    
> >      // Transitioning directly from live to unloaded -- so
> >      // we need to force a cache clean-up; remember this
> >      // for later on.
> >      CodeCache::set_needs_cache_clean(true);
> >    
> >    }
> > 
> > But there is only place where we may create new compiledIC after safepoint
> > when we are resolving call from compiled code:
> > SharedRuntime::resolve_sub_helper().
> > 
> > In this method we record nmethod in local var:
> >    CompiledICInfo virtual_call_info;
> >    
> >      CompiledIC::compute_monomorphic_entry(callee_method, h_klass,
> >      
> >                       is_optimized, static_bound, virtual_call_info,
> >                       CHECK_(methodHandle()));
> > 
> > then we take a lock (which is safepoint):
> >    // grab lock, check for deoptimization and potentially patch caller
> >    {
> >    
> >      MutexLocker ml_patch(CompiledIC_lock);
> > 
> > and then create compiledIC based on info in CompiledICInfo:
> >          CompiledIC* inline_cache = CompiledIC_before(caller_nm,
> >          caller_frame.pc());
> >          if (inline_cache->is_clean()) {
> >          
> >            inline_cache->set_to_monomorphic(virtual_call_info);
> >          
> >          }
> > 
> > at this point nmethod could be unloaded already during above
> > lock/safepoint. But call site is patched and will be used to call this
> > nmethod.
> > 
> > Note, we can't compute entry point under the same lock because:
> >    // Compute entry points. This might require generation of C2I converter
> >    // frames, so we cannot be holding any locks here. Furthermore, the
> >    // computation of the entry points is independent of patching the call.
> > 
> > Please, comment or find any flaws in my logic. The bug is impossible to
> > reproduce.
> > 
> > I am suggesting next fix.
> > 
> > src/share/vm/runtime/sharedRuntime.cpp
> > @@ -1258,12 +1258,14 @@
> > 
> >       // Now that we are ready to patch if the Method* was redefined then
> >       // don't update call site and let the caller retry.
> > 
> > -
> > -    if (!callee_method->is_old()) {
> > +    // Don't update call site if nmethod was unloaded during safepoint
> > +    // which may happened during locking above.
> > +    if (!callee_method->is_old() && (callee_method->code() == callee_nm))
> > {> 
> >   #ifdef ASSERT
> >   
> >         // We must not try to patch to jump to an already unloaded method.
> >         if (dest_entry_point != 0) {
> > 
> > -        assert(CodeCache::find_blob(dest_entry_point) != NULL,
> > +        CodeBlob* cb = CodeCache::find_blob(dest_entry_point);
> > +        assert((cb != NULL) && cb->is_nmethod() &&
> > ((nmethod*)cb)->is_in_use(),> 
> >                  "should not unload nmethod while locked");
> >         
> >         }
> >   
> >   #endif
> > 
> > Thanks,
> > Vladimir