RFR: 8310194: Generational ZGC: Lock-order asserts in JVMTI IterateThroughHeap

Stefan Karlsson stefank at openjdk.org
Fri Jun 16 12:13:29 UTC 2023


On Fri, 16 Jun 2023 11:52:51 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

> Generational ZGC changed the location where the heap object iterator called the visit function. It used to be called when the objects were popped and then followed, but it was changed so that the function where called immediately when the objects are marked. This gave the benefit that the verification code code show exactly what root had a broken object. Unfortunately, this change started to apply non-GC closure during the root iteration. This code lead to lock order issues, just like the one described in the JBS entry.
> 
> This fix moves the visit calls for non-GC closures back to the follow calls.

It was quite hard to provoke this problem, and this bug was found in one of Oracle's internal stress tests. The stack trace looked like this:

#  assert(false) failed: Attempting to acquire lock ObjectSynchronizer Weak alloc lock/oopstorage out of order with lock Service_lock/service -- possible deadlock
...
V  [libjvm.so+0x13fcc3c]  Mutex::check_rank(Thread*)+0x36c  (mutex.cpp:447)
V  [libjvm.so+0x13fdee0]  Mutex::lock_without_safepoint_check()+0x50  (mutex.cpp:145)
V  [libjvm.so+0x14819b9]  OopStorage::allocate()+0x29  (mutexLocker.hpp:199)
V  [libjvm.so+0x18a8bd0]  WeakHandle::WeakHandle(OopStorage*, oop)+0x20  (weakHandle.cpp:37)
V  [libjvm.so+0x144e4f2]  ObjectMonitor::ObjectMonitor(oop)+0x62  (objectMonitor.cpp:261)
V  [libjvm.so+0x17337f5]  ObjectSynchronizer::inflate(Thread*, oop, ObjectSynchronizer::InflateCause)+0x675  (synchronizer.cpp:1425)
V  [libjvm.so+0x1736036]  ObjectSynchronizer::FastHashCode(Thread*, oop)+0x146  (synchronizer.cpp:1001)
V  [libjvm.so+0x146491f]  oopDesc::slow_identity_hash()+0x6f  (oop.cpp:112)
V  [libjvm.so+0x11bf2c5]  JvmtiTagMapTable::remove(oop)+0x265  (oop.inline.hpp:366)
V  [libjvm.so+0x11b6e0f]  CallbackWrapper::~CallbackWrapper()+0xef  (jvmtiTagMap.cpp:248)
V  [libjvm.so+0x11b290c]  IterateThroughHeapObjectClosure::do_object(oop)+0x59c  (jvmtiTagMap.cpp:1089)
V  [libjvm.so+0x1999518]  ZHeapIteratorContext::mark_and_push(oop) const [clone .part.0]+0x58  (zHeapIterator.cpp:81)
V  [libjvm.so+0x199c6d4]  ZHeapIteratorUncoloredRootOopClosure::do_oop(oop*)+0x154  (zHeapIterator.cpp:161)
V  [libjvm.so+0x19bda57]  ZNMethod::nmethod_oops_do_inner(nmethod*, OopClosure*)+0x47  (zNMethod.cpp:264)
V  [libjvm.so+0x19bdc1e]  ZNMethod::nmethod_oops_do(nmethod*, OopClosure*)+0x8e  (zNMethod.cpp:254)
V  [libjvm.so+0x1196195]  JvmtiDeferredEventQueue::oops_do(OopClosure*, CodeBlobClosure*)+0x45  (jvmtiImpl.cpp:1027)
V  [libjvm.so+0x15e5dd9]  ServiceThread::oops_do_no_frames(OopClosure*, CodeBlobClosure*)+0x59  (serviceThread.cpp:220)
V  [libjvm.so+0x1796ac6]  Thread::oops_do(OopClosure*, CodeBlobClosure*)+0x76  (thread.cpp:431)
V  [libjvm.so+0x19fad72]  ZJavaThreadsIterator::apply(ThreadClosure*)+0x102  (zRootsIterator.cpp:136)
V  [libjvm.so+0x19fb135]  ZRootsIteratorStrongUncolored::apply(ThreadClosure*, NMethodClosure*)+0x35  (zRootsIterator.cpp:96)
V  [libjvm.so+0x1997bed]  ZHeapIterator::push_strong_roots(ZHeapIteratorContext const&)+0x10d  (zHeapIterator.cpp:376)
V  [libjvm.so+0x199a37c]  ZHeapIterator::object_iterate(ObjectClosure*, unsigned int)+0x7c  (zHeapIterator.cpp:388)
V  [libjvm.so+0x199476e]  ZHeap::object_iterate(ObjectClosure*, bool)+0x5e  (zHeap.cpp:297)
V  [libjvm.so+0x11b787c]  VM_HeapIterateOperation::doit()+0x5c  (jvmtiTagMap.cpp:877)

You had to provoke the ObjectMonitor inflate code path via the identify_hash call from the JVMTI tagmap code, while the ServiceThread had a registered deferred event.
Note that the Service_lock is taken inside ServiceThread::oops_do_no_frames.

To more easily provoke the bug I temporarily added a oopstorage-ranked lock to IterateThroughHeapObjectClosure:

diff --git a/src/hotspot/share/prims/jvmtiTagMap.cpp b/src/hotspot/share/prims/jvmtiTagMap.cpp
index 6908e1994ba..a0e031eec31 100644
--- a/src/hotspot/share/prims/jvmtiTagMap.cpp
+++ b/src/hotspot/share/prims/jvmtiTagMap.cpp
@@ -967,6 +967,8 @@ class IterateThroughHeapObjectClosure: public ObjectClosure {
   const jvmtiHeapCallbacks* _callbacks;
   const void* _user_data;
 
+  Mutex _lock;
+
   // accessor functions
   JvmtiTagMap* tag_map() const                     { return _tag_map; }
   int heap_filter() const                          { return _heap_filter; }
@@ -1000,6 +1002,7 @@ class IterateThroughHeapObjectClosure: public ObjectClosure {
     _heap_filter(heap_filter),
     _callbacks(heap_callbacks),
     _user_data(user_data),
+    _lock(Mutex::oopstorage, "Test lock"),
     _iteration_aborted(false)
   {
   }
@@ -1009,6 +1012,10 @@ class IterateThroughHeapObjectClosure: public ObjectClosure {
 
 // invoked for each object in the heap
 void IterateThroughHeapObjectClosure::do_object(oop obj) {
+  {
+    MutexLocker locker(&_lock, Mutex::_no_safepoint_check_flag);
+  }
+
   // check if iteration has been halted
   if (is_iteration_aborted()) return;
 

With this I could consistently get the problem to reproduce within a few minutes. After moving the call to the visitor (IterateThroughHeapObjectClosure) to outside the root iterator I've been able to run the reproducer for a few hours without any asserts.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14518#issuecomment-1594566234


More information about the serviceability-dev mailing list