RFR(S): 8040803: G1: Concurrent mark hangs when mark stack overflows
Per Liden
per.liden at oracle.com
Mon May 5 11:57:52 UTC 2014
Hi Jon,
On 04/30/2014 07:52 PM, Jon Masamitsu wrote:
> Per,
>
> Adding a new flag sometimes is like adding a new degree
> of freedom and sometimes can make a complicated situation
> more complicated.
>
> Before I review this can you help me understand the
> problem. Is the window for the race condition this
> code in do_marking_step()?
>
> 4108 if (_cm->has_overflown()) {
> 4109 // This can happen if the mark stack overflows during a GC
> pause
> 4110 // and this task, after a yield point, restarts. We have to
> abort
> 4111 // as we need to get into the overflow protocol which happens
> 4112 // right at the end of this task.
> 4113 set_has_aborted();
> 4114 }
>
> The window being between the time _has_overflown is set and when
> _has_aborted is set?
The race is between checking _cm->has_overflown() and checking
_cm->has_aborted(). Both of these are checked in a few places during
marking (typically in regular_clock_call() and some other place). Since
this code is executed by several threads in parallel, without
synchronization, different threads can see one or the other state first
depending on where a particular thread happens to be executing when the
abort and overflow happens.
Note that the set_has_aborted() in the code above sets the CMTask local
abort state, which is not part of the race here. _cm->has_aborted() is
the global abort state, which is set when a Full GC happens.
/Per
>
> Jon
>
> On 4/30/2014 6:04 AM, Per Liden wrote:
>> Hi,
>>
>> Could I please have a couple of reviews in this bug fix:
>>
>> Summary: G1's concurrent marking can potentially hang forever if the
>> global mark stack overflows and immediately after that a Full GC
>> happens, which tries to abort the marking. The reason is that there's
>> a race between detecting the overflow situation and detecting the
>> abort signal. Threads detecting the overflow situation first will go
>> into the overflow protocol and wait on a barrier for all threads to
>> reach this state. However, threads detecting the abort signal first
>> will terminate and never participate in the barrier.
>>
>> This patch introduces an abort state and function on the
>> WorkGangBarrierSync class, to unblock any threads waiting for the
>> barrier to complete when the concurrent mark is aborted.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8040803
>> Webrev: http://cr.openjdk.java.net/~pliden/8040803/webrev.0/
>>
>> /Per
>
More information about the hotspot-gc-dev
mailing list