RFR(S): G1: assert(_finger == _heap_end) failed, concurrentMark.cpp:809

Bengt Rutisson bengt.rutisson at oracle.com
Wed Mar 13 13:05:44 UTC 2013


Hi John,

Thanks for splitting this up into two webrevs! :)

I understand that we only have to reset the marking phase in 
enter_first_sync_barrier() if we are during concurrent mark since we are 
anyway doing it in checkpointRootsFinal(). And I think you are correct 
about the fact that we don't have to drain the discovered references if 
we have hit an overflow.

So, basically I think I understand the changes for 1. and 3. that you 
explained below.

I'm struggling a bit to grasp the change motivated by 2. Are these the 
calls to set_phase() that triggered the assert before? Do we still need 
to remove them considering that we no longer reset the marking phase 
during remark?

Also, you replaced the calls to set_phase() with calls to 
_cm->terminator()->reset_for_reuse(_active_workers). The comment says 
that this is necessary for the termination protocol in do_marking_step() 
to work properly. But that method also uses _first_overflow_barrier_sync 
and _second_overflow_barrier_sync. Don't we have to reset those too? 
Similarly to what set_phase() does?

Finally, why does it say "// Not strictly necessary but..." in 
G1CMRefProcTaskExecutor::execute() ?

Sorry for all the questions, but I'm really not that familiar with this 
code.

Bengt


On 3/13/13 1:10 AM, John Cuthbertson wrote:
> Hi Everyone,
>
> Can I have a couple of volunteers look over these changes? The webrev 
> can be found at: http://cr.openjdk.java.net/~johnc/8009940/webrev.0/
>
> These changes were, at Bengt's request, split out from the original 
> webrev for 8009536. In the webrev they have been applied on top of 
> 8009536 but they are independent. The issue has been in the G1 code 
> for a while (since I added the reference processing code a couple of 
> years ago) - we just never hit it.
>
> Anyway the assertion is caused by calling set_phase() - which throws 
> the assertion failure - after resetting the marking state - which 
> resets the value of the global finger.
>
> The solution was a three pronged approach:
>
> 1. Skip reference processing if the marking stack overflowed during 
> the actual remark task.
>
> This is safe to do as reference objects are only discovered once. So 
> once a reference object is discovered it will remain on the discovered 
> list until it is processed. Since, as a result of the overflow, 
> marking is going to restart - the references will be processed after a 
> successful remark. We could use the routine 
> abandon_partial_discovery() and depopulate the discovered lists but 
> the only benefit to doing this is that we may discover less references 
> during the subsequent restart.
>
> 2. Remove the set_phase() calls when we execute a new ref processing 
> proxy task
>
> To do this we need to call set_phase() from within weakRefsWork() and 
> call the terminator's reset_for_reuse() routine when we execute a new 
> ref processing task. This will remove the path that checks the assert.
>
> 3. Worker 0 only resets the marking state during the current phase of 
> marking.
>
> During the STW remark pause there is another call to reset the marking 
> stack (in the event of overflow) immediately after the discovered 
> references are processed:
>
>   double start = os::elapsedTime();
>
>   checkpointRootsFinalWork();
>
>   double mark_work_end = os::elapsedTime();
>
>   weakRefsWork(clear_all_soft_refs);
>
>   if (has_overflown()) {
>     // Oops. We overflowed. Restart concurrent marking.
>     _restart_for_overflow = true;
>     // Clear the marking state because we will be restarting
>     // marking due to overflowing the global mark stack.
>     reset_marking_state();
>     if (G1TraceMarkStackOverflow) {
>       gclog_or_tty->print_cr("\nRemark led to restart for overflow.");
>     }
>
> I also fixed the British spelling of initialized in a few places.
>
> Testing:
> The lucene tests with parallel reference processing (with and without 
> verification).
> specjvm98 with parallel reference processing (with and without 
> verification).
>
> Thanks,
>
> JohnC




More information about the hotspot-gc-dev mailing list