CRR (M): 7127706: G1: re-enable survivors during the initial-mark pause

Thu Jan 19 08:39:12 UTC 2012

Tony,

Inline.

On 2012-01-18 21:51, Tony Printezis wrote:
> Hi Bengt,
>
> Thanks for looking at this so quickly! Inline.
>
> Bengt Rutisson wrote:
>>
>> Tony,
>>
>> Overall this looks really good. Thanks for fixing it.
>>
>> Some comments:
>>
>> First, a general question regarding naming and logging. We now talk 
>> about "snapshot" a lot. It is a pretty good name, but maybe it needs 
>> some more context to be understandable in the code and the GC log. I 
>> don't have any really good names, but maybe "survivor_snapshot"
>
> I'd rather not mention "survivors" given that we might add 
> non-survivor regions in the future.

Agreed.

>
>> or "initial_mark_snapshot"?
>
> I like "initial-mark snapshot" better. Having said that 
> CMInitialMarkSnapshotRegions and _initial_mark_snapshot_regions are 
> kinda long. :-) I'll abbreviate to CMIMSnapshotRegions and 
> _im_snapshot_regions if that's OK.

I'll look at your new webrev and comment on that. Thanks for trying it out!

>> concurrentMark.inline.hpp
>>
>>   if (hr == NULL) {
>>     hr = _g1h->heap_region_containing_raw(addr);
>>     // Given that we're looking for a region that contains an object
>>     // header it's impossible to get back a HC region.
>>     assert(!hr->continuesHumongous(), "sanity");
>>   } else {
>>     assert(hr->is_in(addr), "pre-condition");
>>   }
>>
>> The first assert should probably hold even for regions that are 
>> passed in to grayRoot() right? So, maybe something like:
>>
>>   if (hr == NULL) {
>>     hr = _g1h->heap_region_containing_raw(addr);
>>   } else {
>>     assert(hr->is_in(addr), "pre-condition");
>>   }
>>   // Given that we need a region that contains an object
>>   // header it's impossible for it to be a HC region.
>>   assert(!hr->continuesHumongous(), "sanity");
>
> Good observation! I changed to the above.

Great.

>
>> concurrentMarkThread.cpp
>>
>> ConcurrentMarkThread::run()
>>
>> Why do we do the explicit time/date stamping?
>>
>>           gclog_or_tty->date_stamp(PrintGCDateStamps);
>>           gclog_or_tty->stamp(PrintGCTimeStamps);
>>           gclog_or_tty->print_cr("[GC concurrent-snapshot-scan-start]");
>>
>> why is it not enough with the normal -XX:+PrintGCTimeStamps information? 
>
> Not quite sure what you mean with "is it not enough with the normal 
> ... information". Each log record needs either a GC time stamp or a GC 
> date stamp and we have to print either or both depending on the two 
> -XX parameters. Unfortunately, the logging code has not been well 
> abstracted and/or refactored so we have this unfortunate replication 
> throughout the GCs.

Sorry, this was a misunderstanding from my side. I have never really 
thought about how the time stamping was implemented. I could not imagine 
that it was implemented as three different print statements, so I 
thought you were doing something extra here. I realize now that this is 
the "normal" time stamping.

Stating the obvious: We desperately need a new logging framework!

>> This is probably correct since I see this pattern in other places. 
>> But I would like to understand why we do it.
>>
>>
>> g1CollectedHeap.cpp:
>>
>> G1CollectedHeap::do_collection()
>>
>> Is it worth logging how long we had to wait in 
>> _cm->snapshot_regions()->wait_until_scan_finished(), the same way 
>> that we do in G1CollectedHeap::do_collection_pause_at_safepoint()?
>
> Currently, the GC log records for the evacuation pauses have a lot of 
> extra information when +PrintGCDetails is set and it was reasonable to 
> add an extra record with the wait time. And it's more important to 
> know how the wait for snapshot region scanning affects evacuation 
> pauses, which are more critical. The Full GC log records are currently 
> one line and I don't think we want to extend them further (at least, 
> not before we put a decent GC logging framework in place). On the 
> other hand, the snapshot region scanning aborts when a marking cycle 
> is aborted due to a Full GC. So, this wait time should not be long. 
> How about I add a comment in the code saying that, when we introduce a 
> more extensible logging framework, we could add the wait time to the 
> Full GC log records? Something like:
>
>    // Note: When we have a more flexible GC logging framework that
>    // allows us to add optional attributes to a GC log record we
>    // could consider timing and reporting how long we wait in the
>    // following two methods.
>     wait_while_free_regions_coming();
>    // ...
>    _cm->snapshot_regions()->wait_until_scan_finished();

Ok. That's fine.

>> Finally, just some food for thought. Could this be generalized to 
>> more roots? I mean take a snapshot and scan it concurrently.
>
> By scanning the IM snapshot regions we say "these guys are all roots, 
> instead of scanning them during the GC we will scan them 
> concurrently". And we can do that for any object / region a) as long 
> as we know they will not move while they are being scanned and b) 
> because we have the pre-barrier. If any references on the snapshot 
> objects are updated, the pre-barrier ensures that their values at the 
> start of marking will be enqueued and processed.
>
> For external arbitrary roots we don't have a write barrier (and we 
> shouldn't as it'd be too expensive). So, we cannot do that for 
> non-object roots without a "pre-barrier"-type mechanism.

But if we have roots that do not change between initial mark and the 
first young GC we could do it without the pre-barrier, right? I'm 
thinking of classes for example. But then again there is class 
redefinition, so maybe they can change...

Anyway, not something we should do now, just wanted to mention it.

Bengt

>
> Tony
>
>
>> Bengt
>>
>>
>>
>> On 2012-01-18 00:48, Tony Printezis wrote:
>>> Hi all,
>>>
>>> Can I have a couple of code reviews for this change that re-enables 
>>> the use of survivor regions during the initial-mark pause?
>>>
>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.0/
>>>
>>> From the CR:
>>>
>>> We could scan the survivors as we're copying them, however this will 
>>> require more work during the initial-mark GCs (and in particular: 
>>> special-case code in the fast path).
>>>
>>> A better approach is to let the concurrent marking threads scan the 
>>> survivors and mark everything reachable from them a) before any more 
>>> concurrent marking work is done (so that we can just mark the 
>>> objects, without needing to push them on a stack, and let the 
>>> "finger" algorithm discover them) and b) before the next GC starts 
>>> (since, if we copy them, we won't know which of the new survivors 
>>> are the ones we need to scan).
>>>
>>> This approach has the advantage that it does not require any extra 
>>> work during the initial-mark GCs and all the work is done by the 
>>> concurrent marking threads. However, it has the disadvantage that 
>>> the survivor scanning might hold up the next GC. In most cases this 
>>> should not be an issue as GCs take place at a reasonably low rate. 
>>> If it does become a problem we could consider the following:
>>>
>>> - like when the GC locker is active, try to extend the eden to give 
>>> a bit more time to the marking threads to finish scanning the survivors
>>> - instead of waiting for the marking threads, a GC can take over and 
>>> finish up scanning the remaining survivors (typically, we have more 
>>> GC threads than marking threads, so the overhead will be reduced)
>>> - if we supported region pinning, we could pin all the regions that 
>>> were not scanned by the time the GC started so that the marking 
>>> threads can resume scanning them after the GC completes
>>>
>>> Implementation notes:
>>>
>>> I introduced the concept of a "snapshot regions" in the 
>>> ConcurrentMark which is a set of regions that need to be scanned at 
>>> the start of a concurrent cycle. Currently, these can only be 
>>> survivors but maybe we can use the same concept for something else 
>>> in the future.
>>>
>>> Tony
>>>
>>>
>>