VM deadlock between VM shutdown and G1
David Holmes
david.holmes at oracle.com
Thu Sep 13 12:00:16 UTC 2018
Hi Kris,
Okay got it now.
So basically:
Universe::heap()->stop();
has to be written in such a way that it can safely be called even if
there's a GC related VMop in execution.
Cheers,
David
On 13/09/2018 8:14 PM, Krystal Mok wrote:
> Hi David,
>
> Comments inline:
>
> On Thu, Sep 13, 2018 at 2:51 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> Hi Kris,
>
> I didn't quite follow the analysis (see below)
>
> On 13/09/2018 6:57 PM, Krystal Mok wrote:
>
> 1. A Java application thread at an allocation site triggering a G1
> incremental collection
> 2. A thread that called System.exit(), initiating the VM
> shutdown sequence.
> It's in VM's native code so it doesn't block a safepoint.
>
>
> VM code is not "native" in the sense of being safepoint-safe. If
> it's still in the System.c code trying to call the VM then it is
> native but as soon as it tries to enter the VM it will block if a
> safepoint is in progress. In addition the exit requires that the VM
> go to a safepoint before terminating.
>
> The time window was extremely narrow but it did happen in practice.
> The Java application thread called System.exit() -> JVM_Halt(), where
> the safepoint was probably not-yet active so it went past the safepoint
> check upon entry and gets into _thread_in_vm state, and then when it
> tries to post the VM death event it transitions to _thread_in_native
> state (through JvmtiJavaThreadEventTransition, which is then safepoint
> safe), and probably at around this time the safepoint synchronization
> started and then stopped the world.
>
> There's nothing really special about what the cbVMDeath otherwise. This
> process has enabled the JDWP agent and that's a part of the story how
> this thread got into a _thread_in_native state.
>
> 3. VM thread, already inside of a safepoint and started running G1's
> incremental collection.
> (4. "the world" is at a safepoint so all other Java threads are just
> waiting)
>
> The problem is, Thread 2 has already run half way into
> before_exit(), and
>
>
> The problem seems to be an event callback, cbVMDeath, which seems to
> have take the thread from _thread_in_vm (which is not a
> safepoint-safe state) to presumably _thread_in_native, which is
> safepoint-safe. The callback then blocks on a RawMonitorWait for
> something and that would seem to be where the problem arises. What
> is the callback trying to do?
>
>
> The cbVMDeath callback is just waiting for other active callbacks to
> check in (line 1273):
>
> jdk/src/share/back/eventHandler.c
>
> 1267 debugMonitorEnter(callbackBlock); {
> 1268 debugMonitorEnter(callbackLock); {
> 1269 vm_death_callback_active = JNI_TRUE;
> 1270 (void)threadControl_resumeAll();
> 1271 while (active_callbacks > 0) {
> 1272 /* wait for active CALLBACKs to check in (and block) */
> 1273 debugMonitorWait(callbackLock);
> 1274 }
> 1275 } debugMonitorExit(callbackLock);
>
> The real deadlock in this case should still be the fact that VM has
> reached a safepoint at this point in time, but G1's waiting for the
> concurrent marker to check in, yet it's already gone.
>
> Cheers,
> David
>
>
> Thanks,
> Kris
More information about the hotspot-dev
mailing list