RFR(S): 8004816: G1: Kitchensink failures after marking stack changes

Fri Dec 14 14:51:23 UTC 2012

John,

Your choice of reset_marking_state() below is fine.

Jon

On 12/13/2012 3:27 PM, John Cuthbertson wrote:
> Hi Jon,
>
> Thanks. I agree a name change would OK. Any particular preference?
>
> Currently clear_marking_state() is called on the following paths....
>
> The initial mark path...
>
> CheckpointRootsInitialPre -> reset() -> clear_marking_state().
>
> ConcurrentMark initialization path...set_non_marking_state clears that 
> marking is in progress.
>
> ConcurrentMark() -> set_non_marking_state() -> clear_marking_state().
>
> Marking is finished path (from remark)....
>
> CheckpointRootsFinal -> set_non_marking_state() -> clear_marking_state()
>
> Overflow in remark path (1)....
>
> CheckpointRootsFinal -> clear_marking_state()
>
> Overflow from CMTask::do_marking_step() (overflow during actual 
> marking or remark) by worker 0 only....
>
> do_marking_step() -> enter_first_barrier_sync() -> clear_marking_state().
>
> I think that's all of them. Given these, my preference would be 
> reset_marking_state().
>
> Thanks again for the review.
>
> JohnC
>
> On 12/13/12 15:12, Jon Masamitsu wrote:
>> John,
>>
>> Your fix looks good.
>>
>> A small suggestion.  Instead of clear_marking_state() how
>> about reinitialize_marking_state() or reset_marking_state()?
>> When I went to look at what "clear_marking_state"
>> did, I expected it to simply set a variable but it does
>> more.  I thought that "reinitialize" or "reset" would
>> warn the reader that more was happening.
>> If this is in keeping with G1 naming
>> convention, it's fine to leave it.
>>
>> Jon
>>
>>
>>
>> On 12/13/12 09:45, John Cuthbertson wrote:
>>> Hi Everyone,
>>>
>>> Can I have a couple of volunteers review the fix for this CR? The 
>>> webrev can be found at: 
>>> http://cr.openjdk.java.net/~johnc/8004816/webrev.0/
>>>
>>> Summary:
>>> The reduced mark stack size that came in as part of the changes for 
>>> 8000244 exposed this issue. If the marking stack overflowed as a 
>>> result of pushes from the serial reference processing closures, the 
>>> marking stack overflow flag was not cleared. When marking eventually 
>>> completed, after the subsequent restarting, the still set overflow 
>>> flag was detected and resulted in a guarantee failure. I also 
>>> discovered that the actual marking state was not correctly cleared 
>>> for restarting for such an overflow and, as a result, some referent 
>>> objects might have been skipped by marking (those that are reachable 
>>> from the objects we failed to push as a result of the overflow).
>>>
>>> Normally this is not a problem as the serial reference processing 
>>> code is executed infrequently - except on systems with one or two 
>>> cpus. The parallel reference processing closures reset the marking 
>>> state correctly in the event of an overflow in the global marking 
>>> stack.
>>>
>>> Testing:
>>> Kitchensink on 1 and 2 cpu systems with a very low mark stack size 
>>> and marking verification.
>>>
>>> Thanks,
>>>
>>> JohnC
>