VM deadlock between VM shutdown and G1
Leela Mohan
leelamohan.venati at gmail.com
Thu Sep 13 20:54:54 UTC 2018
Hi Kris,
If i understand this deadlock correctly, There are two issues here.
A) Inside "cbVMDeath" , "VM exiting" thread is waiting to be notified by
active JVMTI callbacks to be completed. If we see
"BEGIN_CALLBACK/END_CALLBACK" in jdk
<http://opengrok.azulsystems.com:8080/source/xref/jdk8_dev/jdk/>/src
<http://opengrok.azulsystems.com:8080/source/xref/jdk8_dev/jdk/src/>/share
<http://opengrok.azulsystems.com:8080/source/xref/jdk8_dev/jdk/src/share/>/
back
<http://opengrok.azulsystems.com:8080/source/xref/jdk8_dev/jdk/src/share/back/>
/eventHandler.c, Threads which does JVMTI Callbacks increment
"active_callbacks" when entering and decrement "active_callbacks" when
returning and also, when entering wait indefinitely if there is active
VM_Death callback and when returning, notify all threads waiting on
"callbackLock" if "active_callbacks" is zero. "VM exiting" thread is
waiting on "callbackLock" indefinitely. We need to understand why is
"active_callbacks" non-zero. And that is the *first *issue. Can you put
backtraces for all the threads. And also, I think, indefinite waiting on
"callbackLock" instead of "timed wait" is a bad idea since if "spuriously
failing to notify" situation happens then thread wouldn't unblock.
B) *Second* issue is the one you mentioned. If you look in JVM_Halt, we
call vm_exit() after before_exit() call. vm_exit() call creates VM_Exit VM
operation and pushes to vm queue to be executed. If VMThread deadlocks for
the reason you mentioned, it can't progress to execute VM_Exit operation.
Note that VM_Exit operation is expected to call eventual exit(). Stopping
the GC threads when there is active VM GC operation seems to be bad idea.
Instead, I think, we should move "Universe::heap()->stop()" to
Vm_Exit:doit() can possibly solve these situations.
Let me know if i missed anything.
Thanks,
Leela
On Thu, Sep 13, 2018 at 7:58 AM David Holmes <david.holmes at oracle.com>
wrote:
> Actually this seems to have some similarities with 4953323
>
> And if cbVMDeath is anything like what is depicted here:
>
> http://www.oracle.com/technetwork/articles/javase/index-140680.html
>
> then deadlock seems quite likely.
>
> I'll try to look into this deeper tomorrow.
>
> Cheers,
> David
>
> On 13/09/2018 7:51 PM, David Holmes wrote:
> > Hi Kris,
> >
> > I didn't quite follow the analysis (see below)
> >
> > On 13/09/2018 6:57 PM, Krystal Mok wrote:
> >> Hi HotSpot devs,
> >>
> >> I'd like to report a VM deadlock situation we've experienced in
> >> production
> >> a few weeks ago on JDK8. I checked the latest jdk/jdk code and the same
> >> race condition is still there.
> >>
> >> I can file a JBS ticket if this looks like a valid problem.
> >>
> >> tl;dr: there's a race condition between the VM shutdown and G1, in
> >> that the
> >> VM shutdown sequence may have just terminated all the concurrent GC
> >> threads, and at the same time G1 has just started an incremental
> >> collection
> >> cycle and is waiting for the concurrent marker to finish. Since
> >> there's no
> >> concurrent GC thread alive to update the status anymore, G1 waits
> >> indefinitely while holding a safepoint, causing the VM to deadlock.
> >>
> >> Details:
> >>
> >> 3 threads are involved in the actual deadlock. Detailed stack traces
> >> are at
> >> the end of this email.
> >>
> >> 1. A Java application thread at an allocation site triggering a G1
> >> incremental collection
> >> 2. A thread that called System.exit(), initiating the VM shutdown
> >> sequence.
> >> It's in VM's native code so it doesn't block a safepoint.
> >
> > VM code is not "native" in the sense of being safepoint-safe. If it's
> > still in the System.c code trying to call the VM then it is native but
> > as soon as it tries to enter the VM it will block if a safepoint is in
> > progress. In addition the exit requires that the VM go to a safepoint
> > before terminating.
> >
> >> 3. VM thread, already inside of a safepoint and started running G1's
> >> incremental collection.
> >> (4. "the world" is at a safepoint so all other Java threads are just
> >> waiting)
> >>
> >> The problem is, Thread 2 has already run half way into before_exit(),
> and
> >
> > The problem seems to be an event callback, cbVMDeath, which seems to
> > have take the thread from _thread_in_vm (which is not a safepoint-safe
> > state) to presumably _thread_in_native, which is safepoint-safe. The
> > callback then blocks on a RawMonitorWait for something and that would
> > seem to be where the problem arises. What is the callback trying to do?
> >
> > Cheers,
> > David
> > -----
> >
> >
> >> one of the steps in there is:
> >>
> >> hotspot/src/share/vm/runtime/java.cpp
> >>
> >> 503 // Stop concurrent GC threads
> >> 504 Universe::heap()->stop();
> >>
> >> But G1 is waiting for the concurrent marker to finish scanning the root
> >> regions:
> >>
> >> hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> >>
> >> 506 bool CMRootRegions::wait_until_scan_finished() {
> >> 507 if (!scan_in_progress()) return false;
> >> 508
> >> 509 {
> >> 510 MutexLockerEx x(RootRegionScan_lock,
> >> Mutex::_no_safepoint_check_flag);
> >> 511 while (scan_in_progress()) {
> >> 512 RootRegionScan_lock->wait(Mutex::_no_safepoint_check_flag);
> >> 513 }
> >> 514 }
> >> 515 return true;
> >> 516 }
> >>
> >> But scan_in_process is only updated in a few places:
> >>
> >> hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> >>
> >> 449 void CMRootRegions::prepare_for_scan() {
> >> 450 assert(!scan_in_progress(), "pre-condition");
> >> 451
> >> 452 // Currently, only survivors can be root regions.
> >> 453 assert(_next_survivor == NULL, "pre-condition");
> >> 454 _next_survivor = _young_list->first_survivor_region();
> >> 455 _scan_in_progress = (_next_survivor != NULL);
> >> 456 _should_abort = false;
> >> 457 }
> >>
> >> 490 void CMRootRegions::scan_finished() {
> >> 491 assert(scan_in_progress(), "pre-condition");
> >> 492
> >> 493 // Currently, only survivors can be root regions.
> >> 494 if (!_should_abort) {
> >> 495 assert(_next_survivor == NULL, "we should have claimed all
> >> survivors");
> >> 496 }
> >> 497 _next_survivor = NULL;
> >> 498
> >> 499 {
> >> 500 MutexLockerEx x(RootRegionScan_lock,
> >> Mutex::_no_safepoint_check_flag);
> >> 501 _scan_in_progress = false;
> >> 502 RootRegionScan_lock->notify_all();
> >> 503 }
> >> 504 }
> >>
> >> And with the current GC threads gone, nobody ever gets to set
> >> scan_in_process back to false, which leads to an infinite wait.
> >>
> >> One way I'd imagine fixing this is by adding a check in G1's
> >> CMRootRegions::wait_until_scan_finished() to see if the concurrent
> marker
> >> thread is still there, and if it isn't, there's no point in waiting
> >> anymore.
> >> Obviously similar fixes would have to be done separately for other
> >> concurrent GCs in HotSpot.
> >>
> >> What do you think?
> >>
> >> Thanks,
> >> Kris
> >>
> >> Detail thread stacks for in the example:
> >>
> >> 1. A Java application thread at an allocation site triggering a G1
> >> incremental collection:
> >>
> >> Thread 2164 (Thread 0x7fcfc4e11700 (LWP 25218)):
> >> #0 0x00007fd29387b360 in pthread_cond_wait@@GLIBC_2.3.2 () at
> >> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> >> #1 0x00007fd2931e9a1b in os::PlatformEvent::park()
> >> (this=this at entry=0x7fd05c029c00)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:5842
>
> >>
> >> #2 0x00007fd2931a256f in Monitor::IWait(Thread*, long) (timo=0,
> >> ev=0x7fd05c029c00) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:424
>
> >>
> >> #3 0x00007fd2931a256f in Monitor::IWait(Thread*, long)
> >> (this=this at entry=0x7fd28c012800,
> >> Self=Self at entry=0x7fd1ac0f2000, timo=timo at entry=0) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:802
>
> >>
> >> #4 0x00007fd2931a382e in Monitor::wait(bool, long, bool)
> >> (this=0x7fd28c012800, no_safepoint_check=<optimized out>,
> >> timeout=timeout at entry=0,
> >> as_suspend_equivalent=as_suspend_equivalent at entry=false)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:1126
>
> >>
> >> #5 0x00007fd29339d5fc in VMThread::execute(VM_Operation*)
> >> (op=op at entry=0x7fcfc4e0c430)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:642
>
> >>
> >> #6 0x00007fd292e99fec in
> >> G1CollectedHeap::attempt_allocation_slow(unsigned
> >> long, unsigned char, unsigned int*, unsigned int*)
> >> (gc_cause=GCCause::_g1_inc_collection_pause, succeeded=<synthetic
> >> pointer>,
> >> gc_count_before=<optimized out>, word_size=1026, this=0x7fd28c033c40) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:3628
>
> >>
> >> #7 0x00007fd292e99fec in
> >> G1CollectedHeap::attempt_allocation_slow(unsigned
> >> long, unsigned char, unsigned int*, unsigned int*)
> >> (this=this at entry=0x7fd28c033c40,
> >> word_size=word_size at entry=1026, context=context at entry=0 '\000',
> >> gc_count_before_ret=gc_count_before_ret at entry=0x7fcfc4e0c50c,
> >> gclocker_retry_count_ret=gclocker_retry_count_ret at entry=0x7fcfc4e0c508)
> at
> >>
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:956
>
> >>
> >> #8 0x00007fd292e9b03b in G1CollectedHeap::mem_allocate(unsigned long,
> >> bool*) (gclocker_retry_count_ret=0x7fcfc4e0c508,
> >> gc_count_before_ret=0x7fcfc4e0c50c, word_size=1026,
> >> this=0x7fd28c033c40) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.inline.hpp:147
>
> >>
> >> #9 0x00007fd292e9b03b in G1CollectedHeap::mem_allocate(unsigned long,
> >> bool*) (this=0x7fd28c033c40, word_size=1026,
> >> gc_overhead_limit_was_exceeded=<optimized out>) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:846
>
> >>
> >> #10 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
> >> Thread*) (__the_thread__=0x7fd1ac0f2000, size=1026, klass=...) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:135
>
> >>
> >> #11 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
> >> Thread*) (__the_thread__=0x7fd1ac0f2000, size=1026, klass=...) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:175
>
> >>
> >> #12 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
> >> Thread*) (__the_thread__=0x7fd1ac0f2000, length=8192, size=1026,
> >> klass=...)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:216
>
> >>
> >> #13 0x00007fd293356080 in TypeArrayKlass::allocate_common(int, bool,
> >> Thread*) (this=0x7c0000768, length=length at entry=8192,
> >> do_zero=do_zero at entry=true,
> >> __the_thread__=__the_thread__ at entry=0x7fd1ac0f2000) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/oops/typeArrayKlass.cpp:107
>
> >>
> >> #14 0x00007fd2931d3550 in oopFactory::new_typeArray(BasicType, int,
> >> Thread*) (__the_thread__=__the_thread__ at entry=0x7fd1ac0f2000,
> >> length=length at entry=8192, this=<optimized out>) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/oops/typeArrayKlass.hpp:67
>
> >>
> >> #15 0x00007fd2931d3550 in oopFactory::new_typeArray(BasicType, int,
> >> Thread*) (type=<optimized out>, length=length at entry=8192,
> >> __the_thread__=__the_thread__ at entry=0x7fd1ac0f2000) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/memory/oopFactory.cpp:56
>
> >>
> >> #16 0x00007fd293283457 in OptoRuntime::new_array_C(Klass*, int,
> >> JavaThread*) (array_type=<optimized out>, len=8192,
> >> thread=0x7fd1ac0f2000)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/opto/runtime.cpp:279
>
> >>
> >> #17 0x00007fd27d0871c7 in ()
> >>
> >> 2. A thread that called System.exit(), initiating the VM shutdown
> >> sequence.
> >> It's in native code so it doesn't block a safepoint:
> >>
> >> Thread 563 (Thread 0x7fd017980700 (LWP 7959)):
> >> #0 0x00007fd29387b360 in pthread_cond_wait@@GLIBC_2.3.2 () at
> >> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> >> #1 0x00007fd2931e9a1b in os::PlatformEvent::park()
> (this=0x7fd0f4015000)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:5842
>
> >>
> >> #2 0x00007fd293064485 in JvmtiRawMonitor::SimpleWait(Thread*, long)
> >> (this=this at entry=0x7fd28f22f020, Self=Self at entry=0x7fd28f0c9800,
> >> millis=millis at entry=-1) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiRawMonitor.cpp:194
>
> >>
> >> #3 0x00007fd293064c75 in JvmtiRawMonitor::raw_wait(long, bool, Thread*)
> >> (this=this at entry=0x7fd28f22f020, millis=millis at entry=-1,
> >> interruptible=interruptible at entry=true,
> >> __the_thread__=__the_thread__ at entry=0x7fd28f0c9800)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiRawMonitor.cpp:383
>
> >>
> >> #4 0x00007fd293040c09 in JvmtiEnv::RawMonitorWait(JvmtiRawMonitor*,
> >> long)
> >> (this=<optimized out>, rmonitor=0x7fd28f22f020, millis=-1) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiEnv.cpp:3144
>
> >>
> >> #5 0x00007fd291620af8 in debugMonitorWait (monitor=<optimized out>) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/jdk/src/share/back/util.c:1075
>
> >>
> >> #6 0x00007fd29160e86c in cbVMDeath (jvmti_env=<optimized out>,
> >> env=0x7fd28f0c99e0) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/jdk/src/share/back/eventHandler.c:1273
>
> >>
> >> #7 0x00007fd29304f4b0 in JvmtiExport::post_vm_death() () at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvmtiExport.cpp:490
>
> >>
> >> #8 0x00007fd292f896dd in before_exit(JavaThread*) (thread=<optimized
> >> out>)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/java.cpp:532
>
> >>
> >> #9 0x00007fd292fd005b in JVM_Halt(jint) (code=1) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/prims/jvm.cpp:441
>
> >>
> >> #10 0x00007fd27d020868 in [native] java.lang.Shutdown.halt0(int) () at
> >> java/lang/Shutdown.java:0
> >> #11 0x00007fd27d00809d in [interpreted: bc = 7]
> >> java.lang.Shutdown.halt(int) () at java/lang/Shutdown.java:140
> >> #12 0x00007fd27d00809d in [interpreted: bc = 99]
> >> java.lang.Shutdown.exit(int) () at java/lang/Shutdown.java:214
> >> #13 0x00007fd27d00809d in [interpreted: bc = 14]
> >> java.lang.Runtime.exit(int) () at java/lang/Runtime.java:110
> >> #14 0x00007fd27d00809d in [interpreted: bc = 4]
> >> java.lang.System.exit(int)
> >> () at java/lang/System.java:972
> >> #15 0x00007fd27d00809d in [interpreted: bc = 1]
> >> scala.sys.package$.exit(int) () at scala/sys/package.java:41
> >>
> >> 3. VM thread, already inside of a safepoint and started running G1's
> >> incremental collection:
> >>
> >> Thread 19 (Thread 0x7fd1d70ab700 (LWP 47)):
> >> #0 0x00007fd29387b360 in pthread_cond_wait@@GLIBC_2.3.2 () at
> >> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> >> #1 0x00007fd2931e9a1b in os::PlatformEvent::park()
> >> (this=this at entry=0x7fd28c498200)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:5842
>
> >>
> >> #2 0x00007fd2931a256f in Monitor::IWait(Thread*, long) (timo=0,
> >> ev=0x7fd28c498200) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:424
>
> >>
> >> #3 0x00007fd2931a256f in Monitor::IWait(Thread*, long)
> >> (this=this at entry=0x7fd28c011700,
> >> Self=Self at entry=0x7fd28c497800, timo=timo at entry=0) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:802
>
> >>
> >> #4 0x00007fd2931a37fa in Monitor::wait(bool, long, bool)
> >> (this=0x7fd28c011700, no_safepoint_check=no_safepoint_check at entry=true,
> >> timeout=timeout at entry=0,
> >> as_suspend_equivalent=as_suspend_equivalent at entry=false)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/mutex.cpp:1111
>
> >>
> >> #5 0x00007fd292db1fe2 in CMRootRegions::wait_until_scan_finished()
> >> (this=0x7fd28c0826b8) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp:512
>
> >>
> >> #6 0x00007fd292ea59a4 in
> >> G1CollectedHeap::do_collection_pause_at_safepoint(double)
> >> (this=this at entry=0x7fd28c033c40,
> >> target_pause_time_ms=200) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp:4095
>
> >>
> >> #7 0x00007fd29339f002 in VM_G1IncCollectionPause::doit()
> >> (this=0x7fcfc4e0c430) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/gc_implementation/g1/vm_operations_g1.cpp:148
>
> >>
> >> #8 0x00007fd29339dfd7 in VM_Operation::evaluate()
> >> (this=this at entry=0x7fcfc4e0c430)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vm_operations.cpp:62
>
> >>
> >> #9 0x00007fd29339b6d7 in VMThread::evaluate_operation(VM_Operation*)
> >> (op=0x7fcfc4e0c430, this=<optimized out>) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:377
>
> >>
> >> #10 0x00007fd29339cb5f in VMThread::loop()
> >> (this=this at entry=0x7fd28c497800)
> >> at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:502
>
> >>
> >> #11 0x00007fd29339d051 in VMThread::run() (this=0x7fd28c497800) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/share/vm/runtime/vmThread.cpp:276
>
> >>
> >> #12 0x00007fd2931e1222 in java_start(Thread*) (thread=0x7fd28c497800) at
> >>
> /build/openjdk-8-OdO8jS/openjdk-8-8u162-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:790
>
> >>
> >>
>
More information about the hotspot-dev
mailing list