VM deadlock between VM shutdown and G1

David Holmes david.holmes at oracle.com
Thu Sep 13 09:57:28 UTC 2018


Actually this seems to have some similarities with 4953323

And if cbVMDeath is anything like what is depicted here:

http://www.oracle.com/technetwork/articles/javase/index-140680.html

then deadlock seems quite likely.

I'll try to look into this deeper tomorrow.

Cheers,
David

On 13/09/2018 7:51 PM, David Holmes wrote:
> Hi Kris,
> 
> I didn't quite follow the analysis (see below)
> 
> On 13/09/2018 6:57 PM, Krystal Mok wrote:
>> Hi HotSpot devs,
>>
>> I'd like to report a VM deadlock situation we've experienced in 
>> production
>> a few weeks ago on JDK8. I checked the latest jdk/jdk code and the same
>> race condition is still there.
>>
>> I can file a JBS ticket if this looks like a valid problem.
>>
>> tl;dr: there's a race condition between the VM shutdown and G1, in 
>> that the
>> VM shutdown sequence may have just terminated all the concurrent GC
>> threads, and at the same time G1 has just started an incremental 
>> collection
>> cycle and is waiting for the concurrent marker to finish. Since 
>> there's no
>> concurrent GC thread alive to update the status anymore, G1 waits
>> indefinitely while holding a safepoint, causing the VM to deadlock.
>>
>> Details:
>>
>> 3 threads are involved in the actual deadlock. Detailed stack traces 
>> are at
>> the end of this email.
>>
>> 1. A Java application thread at an allocation site triggering a G1
>> incremental collection
>> 2. A thread that called System.exit(), initiating the VM shutdown 
>> sequence.
>> It's in VM's native code so it doesn't block a safepoint.
> 
> VM code is not "native" in the sense of being safepoint-safe. If it's 
> still in the System.c code trying to call the VM then it is native but 
> as soon as it tries to enter the VM it will block if a safepoint is in 
> progress. In addition the exit requires that the VM go to a safepoint 
> before terminating.
> 
>> 3. VM thread, already inside of a safepoint and started running G1's
>> incremental collection.
>> (4. "the world" is at a safepoint so all other Java threads are just
>> waiting)
>>
>> The problem is, Thread 2 has already run half way into before_exit(), and
> 
> The problem seems to be an event callback, cbVMDeath, which seems to 
> have take the thread from _thread_in_vm (which is not a safepoint-safe 
> state) to presumably _thread_in_native, which is safepoint-safe. The 
> callback then blocks on a RawMonitorWait for something and that would 
> seem to be where the problem arises. What is the callback trying to do?
> 
> Cheers,
> David
> -----
> 
> 
>> one of the steps in there is:
>>
>> hotspot/src/share/vm/runtime/java.cpp
>>
>> 503   // Stop concurrent GC threads
>> 504   Universe::heap()->stop();
>>
>> But G1 is waiting for the concurrent marker to finish scanning the root
>> regions:
>>
>> hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp
>>
>>   506 bool CMRootRegions::wait_until_scan_finished() {
>>   507   if (!scan_in_progress()) return false;
>>   508
>>   509   {
>>   510     MutexLockerEx x(RootRegionScan_lock,
>> Mutex::_no_safepoint_check_flag);
>>   511     while (scan_in_progress()) {
>>   512       RootRegionScan_lock->wait(Mutex::_no_safepoint_check_flag);
>>   513     }
>>   514   }
>>   515   return true;
>>   516 }
>>
>> But scan_in_process is only updated in a few places:
>>
>> hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp
>>
>>   449 void CMRootRegions::prepare_for_scan() {
>>   450   assert(!scan_in_progress(), "pre-condition");
>>   451
>>   452   // Currently, only survivors can be root regions.
>>   453   assert(_next_survivor == NULL, "pre-condition");
>>   454   _next_survivor = _young_list->first_survivor_region();
>>   455   _scan_in_progress = (_next_survivor != NULL);
>>   456   _should_abort = false;
>>   457 }
>>
>>   490 void CMRootRegions::scan_finished() {
>>   491   assert(scan_in_progress(), "pre-condition");
>>   492
>>   493   // Currently, only survivors can be root regions.
>>   494   if (!_should_abort) {
>>   495     assert(_next_survivor == NULL, "we should have claimed all
>> survivors");
>>   496   }
>>   497   _next_survivor = NULL;
>>   498
>>   499   {
>>   500     MutexLockerEx x(RootRegionScan_lock,
>> Mutex::_no_safepoint_check_flag);
>>   501     _scan_in_progress = false;
>>   502     RootRegionScan_lock->notify_all();
>>   503   }
>>   504 }
>>
>> And with the current GC threads gone, nobody ever gets to set
>> scan_in_process back to false, which leads to an infinite wait.
>>
>> One way I'd imagine fixing this is by adding a check in G1's
>> CMRootRegions::wait_until_scan_finished() to see if the concurrent marker
>> thread is still there, and if it isn't, there's no point in waiting 
>> anymore.
>> Obviously similar fixes would have to be done separately for other
>> concurrent GCs in HotSpot.
>>
>> What do you think?
>>
>> Thanks,
>> Kris
>>
>> Detail thread stacks for in the example:
>>
>> 1. A Java application thread at an allocation site triggering a G1
>> incremental collection:
>>
>> Thread 2164 (Thread 0x7fcfc4e11700 (LWP 25218)):
>> #0  0x00007fd29387b360 in pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> #1  0x00007fd2931e9a1b in os::PlatformEvent::park()
>> (this=this at entry=0x7fd05c029c00)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:5842 
>>
>> #2  0x00007fd2931a256f in Monitor::IWait(Thread*, long) (timo=0,
>> ev=0x7fd05c029c00) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:424 
>>
>> #3  0x00007fd2931a256f in Monitor::IWait(Thread*, long)
>> (this=this at entry=0x7fd28c012800,
>> Self=Self at entry=0x7fd1ac0f2000, timo=timo at entry=0) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:802 
>>
>> #4  0x00007fd2931a382e in Monitor::wait(bool, long, bool)
>> (this=0x7fd28c012800, no_safepoint_check=<optimized out>,
>> timeout=timeout at entry=0,
>> as_suspend_equivalent=as_suspend_equivalent at entry=false)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:1126 
>>
>> #5  0x00007fd29339d5fc in VMThread::execute(VM_Operation*)
>> (op=op at entry=0x7fcfc4e0c430)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:642 
>>
>> #6  0x00007fd292e99fec in 
>> G1CollectedHeap::attempt_allocation_slow(unsigned
>> long, unsigned char, unsigned int*, unsigned int*)
>> (gc_cause=GCCause::_g1_inc_collection_pause, succeeded=<synthetic 
>> pointer>,
>> gc_count_before=<optimized out>, word_size=1026, this=0x7fd28c033c40) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:3628 
>>
>> #7  0x00007fd292e99fec in 
>> G1CollectedHeap::attempt_allocation_slow(unsigned
>> long, unsigned char, unsigned int*, unsigned int*)
>> (this=this at entry=0x7fd28c033c40,
>> word_size=word_size at entry=1026, context=context at entry=0 '\000',
>> gc_count_before_ret=gc_count_before_ret at entry=0x7fcfc4e0c50c,
>> gclocker_retry_count_ret=gclocker_retry_count_ret at entry=0x7fcfc4e0c508) at 
>>
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:956 
>>
>> #8  0x00007fd292e9b03b in G1CollectedHeap::mem_allocate(unsigned long,
>> bool*) (gclocker_retry_count_ret=0x7fcfc4e0c508,
>> gc_count_before_ret=0x7fcfc4e0c50c, word_size=1026, 
>> this=0x7fd28c033c40) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.inline.hpp:147 
>>
>> #9  0x00007fd292e9b03b in G1CollectedHeap::mem_allocate(unsigned long,
>> bool*) (this=0x7fd28c033c40, word_size=1026,
>> gc_overhead_limit_was_exceeded=<optimized out>) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:846 
>>
>> #10 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
>> Thread*) (__the_thread__=0x7fd1ac0f2000, size=1026, klass=...) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:135 
>>
>> #11 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
>> Thread*) (__the_thread__=0x7fd1ac0f2000, size=1026, klass=...) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:175 
>>
>> #12 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
>> Thread*) (__the_thread__=0x7fd1ac0f2000, length=8192, size=1026, 
>> klass=...)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:216 
>>
>> #13 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
>> Thread*) (this=0x7c0000768, length=length at entry=8192,
>> do_zero=do_zero at entry=true,
>> __the_thread__=__the_thread__ at entry=0x7fd1ac0f2000) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/oops/typeArrayKlass.cpp:107 
>>
>> #14 0x00007fd2931d3550 in oopFactory::new_typeArray(BasicType, int,
>> Thread*) (__the_thread__=__the_thread__ at entry=0x7fd1ac0f2000,
>> length=length at entry=8192, this=<optimized out>) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/oops/typeArrayKlass.hpp:67 
>>
>> #15 0x00007fd2931d3550 in oopFactory::new_typeArray(BasicType, int,
>> Thread*) (type=<optimized out>, length=length at entry=8192,
>> __the_thread__=__the_thread__ at entry=0x7fd1ac0f2000) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/memory/oopFactory.cpp:56 
>>
>> #16 0x00007fd293283457 in OptoRuntime::new_array_C(Klass*, int,
>> JavaThread*) (array_type=<optimized out>, len=8192, 
>> thread=0x7fd1ac0f2000)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/opto/runtime.cpp:279 
>>
>> #17 0x00007fd27d0871c7 in  ()
>>
>> 2. A thread that called System.exit(), initiating the VM shutdown 
>> sequence.
>> It's in native code so it doesn't block a safepoint:
>>
>> Thread 563 (Thread 0x7fd017980700 (LWP 7959)):
>> #0  0x00007fd29387b360 in pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> #1  0x00007fd2931e9a1b in os::PlatformEvent::park() (this=0x7fd0f4015000)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:5842 
>>
>> #2  0x00007fd293064485 in JvmtiRawMonitor::SimpleWait(Thread*, long)
>> (this=this at entry=0x7fd28f22f020, Self=Self at entry=0x7fd28f0c9800,
>> millis=millis at entry=-1) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiRawMonitor.cpp:194 
>>
>> #3  0x00007fd293064c75 in JvmtiRawMonitor::raw_wait(long, bool, Thread*)
>> (this=this at entry=0x7fd28f22f020, millis=millis at entry=-1,
>> interruptible=interruptible at entry=true,
>> __the_thread__=__the_thread__ at entry=0x7fd28f0c9800)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiRawMonitor.cpp:383 
>>
>> #4  0x00007fd293040c09 in JvmtiEnv::RawMonitorWait(JvmtiRawMonitor*, 
>> long)
>> (this=<optimized out>, rmonitor=0x7fd28f22f020, millis=-1) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiEnv.cpp:3144 
>>
>> #5  0x00007fd291620af8 in debugMonitorWait (monitor=<optimized out>) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/jdk/src/share/back/util.c:1075 
>>
>> #6  0x00007fd29160e86c in cbVMDeath (jvmti_env=<optimized out>,
>> env=0x7fd28f0c99e0) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/jdk/src/share/back/eventHandler.c:1273 
>>
>> #7  0x00007fd29304f4b0 in JvmtiExport::post_vm_death() () at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiExport.cpp:490 
>>
>> #8  0x00007fd292f896dd in before_exit(JavaThread*) (thread=<optimized 
>> out>)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/java.cpp:532 
>>
>> #9  0x00007fd292fd005b in JVM_Halt(jint) (code=1) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvm.cpp:441 
>>
>> #10 0x00007fd27d020868 in [native] java.lang.Shutdown.halt0(int) () at
>> java/lang/Shutdown.java:0
>> #11 0x00007fd27d00809d in [interpreted: bc = 7]
>> java.lang.Shutdown.halt(int) () at java/lang/Shutdown.java:140
>> #12 0x00007fd27d00809d in [interpreted: bc = 99]
>> java.lang.Shutdown.exit(int) () at java/lang/Shutdown.java:214
>> #13 0x00007fd27d00809d in [interpreted: bc = 14]
>> java.lang.Runtime.exit(int) () at java/lang/Runtime.java:110
>> #14 0x00007fd27d00809d in [interpreted: bc = 4] 
>> java.lang.System.exit(int)
>> () at java/lang/System.java:972
>> #15 0x00007fd27d00809d in [interpreted: bc = 1]
>> scala.sys.package$.exit(int) () at scala/sys/package.java:41
>>
>> 3. VM thread, already inside of a safepoint and started running G1's
>> incremental collection:
>>
>> Thread 19 (Thread 0x7fd1d70ab700 (LWP 47)):
>> #0  0x00007fd29387b360 in pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> #1  0x00007fd2931e9a1b in os::PlatformEvent::park()
>> (this=this at entry=0x7fd28c498200)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:5842 
>>
>> #2  0x00007fd2931a256f in Monitor::IWait(Thread*, long) (timo=0,
>> ev=0x7fd28c498200) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:424 
>>
>> #3  0x00007fd2931a256f in Monitor::IWait(Thread*, long)
>> (this=this at entry=0x7fd28c011700,
>> Self=Self at entry=0x7fd28c497800, timo=timo at entry=0) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:802 
>>
>> #4  0x00007fd2931a37fa in Monitor::wait(bool, long, bool)
>> (this=0x7fd28c011700, no_safepoint_check=no_safepoint_check at entry=true,
>> timeout=timeout at entry=0,
>> as_suspend_equivalent=as_suspend_equivalent at entry=false)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:1111 
>>
>> #5  0x00007fd292db1fe2 in CMRootRegions::wait_until_scan_finished()
>> (this=0x7fd28c0826b8) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp:512 
>>
>> #6  0x00007fd292ea59a4 in
>> G1CollectedHeap::do_collection_pause_at_safepoint(double)
>> (this=this at entry=0x7fd28c033c40,
>> target_pause_time_ms=200) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:4095 
>>
>> #7  0x00007fd29339f002 in VM_G1IncCollectionPause::doit()
>> (this=0x7fcfc4e0c430) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/vm_operations_g1.cpp:148 
>>
>> #8  0x00007fd29339dfd7 in VM_Operation::evaluate()
>> (this=this at entry=0x7fcfc4e0c430)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vm_operations.cpp:62 
>>
>> #9  0x00007fd29339b6d7 in VMThread::evaluate_operation(VM_Operation*)
>> (op=0x7fcfc4e0c430, this=<optimized out>) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:377 
>>
>> #10 0x00007fd29339cb5f in VMThread::loop() 
>> (this=this at entry=0x7fd28c497800)
>> at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:502 
>>
>> #11 0x00007fd29339d051 in VMThread::run() (this=0x7fd28c497800) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:276 
>>
>> #12 0x00007fd2931e1222 in java_start(Thread*) (thread=0x7fd28c497800) at
>> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:790 
>>
>>


More information about the hotspot-dev mailing list