RFR: 8256020: Don't resurrect objects on argument-dependency access

Tue Nov 10 17:59:59 UTC 2020

On Tue, 10 Nov 2020 17:17:13 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> > > > So your theory is that someone calls Dependencies::DepStream::recorded_oop_at on an nmethod, after marking terminated, leaking out a dead object. For your theory to be true, you would have acquired a is_unloading() nmethod from somewhere, and called Dependencies::DepStream::recorded_oop_at on it. That immediately excludes e.g. all on-stack nmethods, all nmethods handed out through dependency contexts, and all nmethods handed out through only_alive_and_not_unloading CodeCache iterators, which is almost all of them. There are very few code cache iterations that expose is_unloading() nmethods, and what they have in common is that they are _not_ poking around at oops.
> 
> > > > So I suppose I really don't understand what path you could possibly track this to happen, where you have an is_unloading() nmethod, and start poking around at its oops. Would you mind elaborating a bit more, from what context you think Dependencies::DepStream::recorded_oop_at() is being called on an is_unloading() nmethod?
> 
> > > 
> 
> > > 
> 
> > > I am digging a little deeper. Planting an assert in relevant place barrier shows that we are exposing an unmarked object in this code path in nmethod::flush_dependencies():
> 
> > > ```
> 
> > >         oop call_site = deps.argument_oop(0);
> 
> > >         if (delete_immediately) {
> 
> > >           assert_locked_or_safepoint(CodeCache_lock);
> 
> > >           MethodHandles::remove_dependent_nmethod(call_site, this);
> 
> > >         } else {
> 
> > >           MethodHandles::clean_dependency_context(call_site);
> 
> > >         }
> 
> > > ```
> 
> > > 
> 
> > > 
> 
> > > ```
> 
> > > #  Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp:112), pid=2396340, tid=2396367
> 
> > > #  assert(obj == __null || !_heap->is_concurrent_weak_root_in_progress() || _heap->marking_context()->is_marked(obj)) failed: only expose marked objects
> 
> > > 
> 
> > > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
> 
> > > V  [libjvm.so+0x626fa0]  AccessInternal::PostRuntimeDispatch<ShenandoahBarrierSet::AccessBarrier<544868ul, ShenandoahBarrierSet>, (AccessInternal::BarrierType)2, 544868ul>::oop_access_barrier(void*)+0x290
> 
> > > V  [libjvm.so+0x132b3ea]  nmethod::oop_at(int) const+0x4a
> 
> > > V  [libjvm.so+0x9e862e]  Dependencies::DepStream::argument_oop(int)+0x7e
> 
> > > V  [libjvm.so+0x132fe21]  nmethod::flush_dependencies(bool)+0x1f1
> 
> > > V  [libjvm.so+0x1580e79]  ShenandoahNMethodUnlinkClosure::do_nmethod(nmethod*)+0x3d9
> 
> > > V  [libjvm.so+0x16113fc]  ShenandoahNMethodTableSnapshot::concurrent_nmethods_do(NMethodClosure*)+0x8c
> 
> > > V  [libjvm.so+0x157f19b]  ShenandoahUnlinkTask::work(unsigned int)+0x2b
> 
> > > V  [libjvm.so+0x18faa54]  GangWorker::run_task(WorkData)+0x84
> 
> > > V  [libjvm.so+0x18fabb3]  GangWorker::loop()+0x63
> 
> > > V  [libjvm.so+0x17b6008]  Thread::call_run()+0xf8
> 
> > > V  [libjvm.so+0x13b6d2e]  thread_native_entry(Thread*)+0x10e
> 
> > > ```
> 
> > > 
> 
> > > 
> 
> > > In other words, it gets the unmarked call_site, then does access it afterwards. TBH, I am not totally sure that we aren't doing something wrong somewhere. ?
> 
> > 
> 
> > This is all very intentional. This is called by the class unloading code, and is the very reason why it is strong. The class unloading code, concurrent or STW doesn't matter, peeks and walks a chain of dead objects to find and clean up native dependency contexts of CallSites. This means that we have to expose the dead objects to the class unloading code, so that we can walk the dead objects, find the dependency context, and clean it. So if we would perform your requested change, we would probably crash here with SIGSEGV, trying to dereference NULL, as it is expected that the call_site is not NULL.
> 
> 
> 
> Right. No, it doesn't crash there. It only ever fails our verification.

Okay.

> I see that a couple of IN_NATIVE accesses are also decorated with AS_NO_KEEPALIVE. We interpret that as 'skip SATB barrier' during marking, I wonder if that is even the correct interpretation.

That is the right interpretation.

> The original assert seems to imply that a new MutableCallSite refers to an old but unmarked DirectMethodHandle.

Sounds like the old MutableCallSite escaped the snapshot at the beginning somehow. Maybe it is related to your new reference processor?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1113