CRR (M): 7127706: G1: re-enable survivors during the initial-mark pause

Wed Jan 18 20:51:34 UTC 2012

Hi Bengt,

Thanks for looking at this so quickly! Inline.

Bengt Rutisson wrote:
>
> Tony,
>
> Overall this looks really good. Thanks for fixing it.
>
> Some comments:
>
> First, a general question regarding naming and logging. We now talk 
> about "snapshot" a lot. It is a pretty good name, but maybe it needs 
> some more context to be understandable in the code and the GC log. I 
> don't have any really good names, but maybe "survivor_snapshot"

I'd rather not mention "survivors" given that we might add non-survivor 
regions in the future.

> or "initial_mark_snapshot"?

I like "initial-mark snapshot" better. Having said that 
CMInitialMarkSnapshotRegions and _initial_mark_snapshot_regions are 
kinda long. :-) I'll abbreviate to CMIMSnapshotRegions and 
_im_snapshot_regions if that's OK.

> concurrentMark.inline.hpp
>
>   if (hr == NULL) {
>     hr = _g1h->heap_region_containing_raw(addr);
>     // Given that we're looking for a region that contains an object
>     // header it's impossible to get back a HC region.
>     assert(!hr->continuesHumongous(), "sanity");
>   } else {
>     assert(hr->is_in(addr), "pre-condition");
>   }
>
> The first assert should probably hold even for regions that are passed 
> in to grayRoot() right? So, maybe something like:
>
>   if (hr == NULL) {
>     hr = _g1h->heap_region_containing_raw(addr);
>   } else {
>     assert(hr->is_in(addr), "pre-condition");
>   }
>   // Given that we need a region that contains an object
>   // header it's impossible for it to be a HC region.
>   assert(!hr->continuesHumongous(), "sanity");

Good observation! I changed to the above.

> concurrentMarkThread.cpp
>
> ConcurrentMarkThread::run()
>
> Why do we do the explicit time/date stamping?
>
>           gclog_or_tty->date_stamp(PrintGCDateStamps);
>           gclog_or_tty->stamp(PrintGCTimeStamps);
>           gclog_or_tty->print_cr("[GC concurrent-snapshot-scan-start]");
>
> why is it not enough with the normal -XX:+PrintGCTimeStamps information? 

Not quite sure what you mean with "is it not enough with the normal ... 
information". Each log record needs either a GC time stamp or a GC date 
stamp and we have to print either or both depending on the two -XX 
parameters. Unfortunately, the logging code has not been well abstracted 
and/or refactored so we have this unfortunate replication throughout the 
GCs.

> This is probably correct since I see this pattern in other places. But 
> I would like to understand why we do it.
>
>
> g1CollectedHeap.cpp:
>
> G1CollectedHeap::do_collection()
>
> Is it worth logging how long we had to wait in 
> _cm->snapshot_regions()->wait_until_scan_finished(), the same way that 
> we do in G1CollectedHeap::do_collection_pause_at_safepoint()?

Currently, the GC log records for the evacuation pauses have a lot of 
extra information when +PrintGCDetails is set and it was reasonable to 
add an extra record with the wait time. And it's more important to know 
how the wait for snapshot region scanning affects evacuation pauses, 
which are more critical. The Full GC log records are currently one line 
and I don't think we want to extend them further (at least, not before 
we put a decent GC logging framework in place). On the other hand, the 
snapshot region scanning aborts when a marking cycle is aborted due to a 
Full GC. So, this wait time should not be long. How about I add a 
comment in the code saying that, when we introduce a more extensible 
logging framework, we could add the wait time to the Full GC log 
records? Something like:

    // Note: When we have a more flexible GC logging framework that
    // allows us to add optional attributes to a GC log record we
    // could consider timing and reporting how long we wait in the
    // following two methods.
     wait_while_free_regions_coming();
    // ...
    _cm->snapshot_regions()->wait_until_scan_finished();

> Finally, just some food for thought. Could this be generalized to more 
> roots? I mean take a snapshot and scan it concurrently.

By scanning the IM snapshot regions we say "these guys are all roots, 
instead of scanning them during the GC we will scan them concurrently". 
And we can do that for any object / region a) as long as we know they 
will not move while they are being scanned and b) because we have the 
pre-barrier. If any references on the snapshot objects are updated, the 
pre-barrier ensures that their values at the start of marking will be 
enqueued and processed.

For external arbitrary roots we don't have a write barrier (and we 
shouldn't as it'd be too expensive). So, we cannot do that for 
non-object roots without a "pre-barrier"-type mechanism.

Tony

> Bengt
>
>
>
> On 2012-01-18 00:48, Tony Printezis wrote:
>> Hi all,
>>
>> Can I have a couple of code reviews for this change that re-enables 
>> the use of survivor regions during the initial-mark pause?
>>
>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.0/
>>
>> From the CR:
>>
>> We could scan the survivors as we're copying them, however this will 
>> require more work during the initial-mark GCs (and in particular: 
>> special-case code in the fast path).
>>
>> A better approach is to let the concurrent marking threads scan the 
>> survivors and mark everything reachable from them a) before any more 
>> concurrent marking work is done (so that we can just mark the 
>> objects, without needing to push them on a stack, and let the 
>> "finger" algorithm discover them) and b) before the next GC starts 
>> (since, if we copy them, we won't know which of the new survivors are 
>> the ones we need to scan).
>>
>> This approach has the advantage that it does not require any extra 
>> work during the initial-mark GCs and all the work is done by the 
>> concurrent marking threads. However, it has the disadvantage that the 
>> survivor scanning might hold up the next GC. In most cases this 
>> should not be an issue as GCs take place at a reasonably low rate. If 
>> it does become a problem we could consider the following:
>>
>> - like when the GC locker is active, try to extend the eden to give a 
>> bit more time to the marking threads to finish scanning the survivors
>> - instead of waiting for the marking threads, a GC can take over and 
>> finish up scanning the remaining survivors (typically, we have more 
>> GC threads than marking threads, so the overhead will be reduced)
>> - if we supported region pinning, we could pin all the regions that 
>> were not scanned by the time the GC started so that the marking 
>> threads can resume scanning them after the GC completes
>>
>> Implementation notes:
>>
>> I introduced the concept of a "snapshot regions" in the 
>> ConcurrentMark which is a set of regions that need to be scanned at 
>> the start of a concurrent cycle. Currently, these can only be 
>> survivors but maybe we can use the same concept for something else in 
>> the future.
>>
>> Tony
>>
>>
>