RFR(S): 8040803: G1: Concurrent mark hangs when mark stack overflows

Wed Apr 30 17:52:08 UTC 2014

Per,

Adding a new flag sometimes is like adding a new degree
of freedom and sometimes can make a complicated situation
more complicated.

Before I review this can you help  me understand the
problem.   Is the window for the race condition this
code in do_marking_step()?

   4108    if (_cm->has_overflown()) {
   4109      // This can happen if the mark stack overflows during a GC 
pause
   4110      // and this task, after a yield point, restarts. We have to 
abort
   4111      // as we need to get into the overflow protocol which happens
   4112      // right at the end of this task.
   4113      set_has_aborted();
   4114    }

The window being between the time _has_overflown is set and when
_has_aborted is set?

Jon

On 4/30/2014 6:04 AM, Per Liden wrote:
> Hi,
>
> Could I please have a couple of reviews in this bug fix:
>
> Summary: G1's concurrent marking can potentially hang forever if the 
> global mark stack overflows and immediately after that a Full GC 
> happens, which tries to abort the marking. The reason is that there's 
> a race between detecting the overflow situation and detecting the 
> abort signal. Threads detecting the overflow situation first will go 
> into the overflow protocol and wait on a barrier for all threads to 
> reach this state. However, threads detecting the abort signal first 
> will terminate and never participate in the barrier.
>
> This patch introduces an abort state and function on the 
> WorkGangBarrierSync class, to unblock any threads waiting for the 
> barrier to complete when the concurrent mark is aborted.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8040803
> Webrev: http://cr.openjdk.java.net/~pliden/8040803/webrev.0/
>
> /Per