RFR(S): 8004816: G1: Kitchensink failures after marking stack changes

Thu Dec 13 23:27:44 UTC 2012

Hi Jon,

Thanks. I agree a name change would OK. Any particular preference?

Currently clear_marking_state() is called on the following paths....

The initial mark path...

CheckpointRootsInitialPre -> reset() -> clear_marking_state().

ConcurrentMark initialization path...set_non_marking_state clears that 
marking is in progress.

ConcurrentMark() -> set_non_marking_state() -> clear_marking_state().

Marking is finished path (from remark)....

CheckpointRootsFinal -> set_non_marking_state() -> clear_marking_state()

Overflow in remark path (1)....

CheckpointRootsFinal -> clear_marking_state()

Overflow from CMTask::do_marking_step() (overflow during actual marking 
or remark) by worker 0 only....

do_marking_step() -> enter_first_barrier_sync() -> clear_marking_state().

I think that's all of them. Given these, my preference would be 
reset_marking_state().

Thanks again for the review.

JohnC

On 12/13/12 15:12, Jon Masamitsu wrote:
> John,
>
> Your fix looks good.
>
> A small suggestion.  Instead of clear_marking_state() how
> about reinitialize_marking_state() or reset_marking_state()?
> When I went to look at what "clear_marking_state"
> did, I expected it to simply set a variable but it does
> more.  I thought that "reinitialize" or "reset" would
> warn the reader that more was happening.
> If this is in keeping with G1 naming
> convention, it's fine to leave it.
>
> Jon
>
>
>
> On 12/13/12 09:45, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> Can I have a couple of volunteers review the fix for this CR? The 
>> webrev can be found at: 
>> http://cr.openjdk.java.net/~johnc/8004816/webrev.0/
>>
>> Summary:
>> The reduced mark stack size that came in as part of the changes for 
>> 8000244 exposed this issue. If the marking stack overflowed as a 
>> result of pushes from the serial reference processing closures, the 
>> marking stack overflow flag was not cleared. When marking eventually 
>> completed, after the subsequent restarting, the still set overflow 
>> flag was detected and resulted in a guarantee failure. I also 
>> discovered that the actual marking state was not correctly cleared 
>> for restarting for such an overflow and, as a result, some referent 
>> objects might have been skipped by marking (those that are reachable 
>> from the objects we failed to push as a result of the overflow).
>>
>> Normally this is not a problem as the serial reference processing 
>> code is executed infrequently - except on systems with one or two 
>> cpus. The parallel reference processing closures reset the marking 
>> state correctly in the event of an overflow in the global marking stack.
>>
>> Testing:
>> Kitchensink on 1 and 2 cpu systems with a very low mark stack size 
>> and marking verification.
>>
>> Thanks,
>>
>> JohnC