CRR (M): 7127706: G1: re-enable survivors during the initial-mark pause

Thu Jan 19 14:40:37 UTC 2012

Bengt (and all),

Updated webrev using "root regions" now:

http://cr.openjdk.java.net/~tonyp/7127706/webrev.2/

Tony

On 01/19/2012 09:04 AM, Tony Printezis wrote:
> Bengt,
>
> Inline (again!)
>
> On 1/19/2012 4:03 AM, Bengt Rutisson wrote:
>>
>> Tony,
>>
>> On 2012-01-18 22:31, Tony Printezis wrote:
>>> Bengt,
>>>
>>> Here's a webrev with the renaming:
>>>
>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.1/
>>>
>>> I have to say I'm not sure I really like the term "initial-mark / IM 
>>> snapshot regions". I'll try to come up with an alternative name for 
>>> them....
>>
>> Looked quickly at the new webrev.
>>
>> I agree that IM-snapshot might not be optimal. Still I like the fact 
>> that it is not just "snapshot" since I think that can easily be 
>> confused with the SATB terms. 
>
> Well, it's supposed to be. In SATB, anything that's reachable from the 
> "snapshot" at initial-mark time will be retained. Since we want to 
> avoid explicitly marking the survivors, we make them all implicitly 
> live and part of the "snapshot" which is why we have to scan them (the 
> same way we scan roots during the initial-mark pause). So, "snapshot 
> regions" is not unreasonable given that if we were not using SATB we 
> would not be able to do this.
>
>> It is of course part of the SATB snapshot, but not the whole thing.
>>
>> Just thinking aloud here, what about not using the word "snapshot" at 
>> all? How about "to_be_scanned_regions", "root_regions" or 
>> "concurrent_roots"?
>
> I like "root regions". Let's go with that (it's shorter too!).
>
> Tony
>
>> Really not sure what a good name is here...I kind of like the log 
>> message "Concurrent root scanning took 0.000x ms".
>>
>> Bengt
>>
>>>
>>> Tony
>>>
>>> Tony Printezis wrote:
>>>> Hi Bengt,
>>>>
>>>> Thanks for looking at this so quickly! Inline.
>>>>
>>>> Bengt Rutisson wrote:
>>>>>
>>>>> Tony,
>>>>>
>>>>> Overall this looks really good. Thanks for fixing it.
>>>>>
>>>>> Some comments:
>>>>>
>>>>> First, a general question regarding naming and logging. We now 
>>>>> talk about "snapshot" a lot. It is a pretty good name, but maybe 
>>>>> it needs some more context to be understandable in the code and 
>>>>> the GC log. I don't have any really good names, but maybe 
>>>>> "survivor_snapshot"
>>>>
>>>> I'd rather not mention "survivors" given that we might add 
>>>> non-survivor regions in the future.
>>>>
>>>>> or "initial_mark_snapshot"?
>>>>
>>>> I like "initial-mark snapshot" better. Having said that 
>>>> CMInitialMarkSnapshotRegions and _initial_mark_snapshot_regions are 
>>>> kinda long. :-) I'll abbreviate to CMIMSnapshotRegions and 
>>>> _im_snapshot_regions if that's OK.
>>>>
>>>>> concurrentMark.inline.hpp
>>>>>
>>>>>   if (hr == NULL) {
>>>>>     hr = _g1h->heap_region_containing_raw(addr);
>>>>>     // Given that we're looking for a region that contains an object
>>>>>     // header it's impossible to get back a HC region.
>>>>>     assert(!hr->continuesHumongous(), "sanity");
>>>>>   } else {
>>>>>     assert(hr->is_in(addr), "pre-condition");
>>>>>   }
>>>>>
>>>>> The first assert should probably hold even for regions that are 
>>>>> passed in to grayRoot() right? So, maybe something like:
>>>>>
>>>>>   if (hr == NULL) {
>>>>>     hr = _g1h->heap_region_containing_raw(addr);
>>>>>   } else {
>>>>>     assert(hr->is_in(addr), "pre-condition");
>>>>>   }
>>>>>   // Given that we need a region that contains an object
>>>>>   // header it's impossible for it to be a HC region.
>>>>>   assert(!hr->continuesHumongous(), "sanity");
>>>>
>>>> Good observation! I changed to the above.
>>>>
>>>>> concurrentMarkThread.cpp
>>>>>
>>>>> ConcurrentMarkThread::run()
>>>>>
>>>>> Why do we do the explicit time/date stamping?
>>>>>
>>>>>           gclog_or_tty->date_stamp(PrintGCDateStamps);
>>>>>           gclog_or_tty->stamp(PrintGCTimeStamps);
>>>>>           gclog_or_tty->print_cr("[GC 
>>>>> concurrent-snapshot-scan-start]");
>>>>>
>>>>> why is it not enough with the normal -XX:+PrintGCTimeStamps 
>>>>> information? 
>>>>
>>>> Not quite sure what you mean with "is it not enough with the normal 
>>>> ... information". Each log record needs either a GC time stamp or a 
>>>> GC date stamp and we have to print either or both depending on the 
>>>> two -XX parameters. Unfortunately, the logging code has not been 
>>>> well abstracted and/or refactored so we have this unfortunate 
>>>> replication throughout the GCs.
>>>>
>>>>> This is probably correct since I see this pattern in other places. 
>>>>> But I would like to understand why we do it.
>>>>>
>>>>>
>>>>> g1CollectedHeap.cpp:
>>>>>
>>>>> G1CollectedHeap::do_collection()
>>>>>
>>>>> Is it worth logging how long we had to wait in 
>>>>> _cm->snapshot_regions()->wait_until_scan_finished(), the same way 
>>>>> that we do in G1CollectedHeap::do_collection_pause_at_safepoint()?
>>>>
>>>> Currently, the GC log records for the evacuation pauses have a lot 
>>>> of extra information when +PrintGCDetails is set and it was 
>>>> reasonable to add an extra record with the wait time. And it's more 
>>>> important to know how the wait for snapshot region scanning affects 
>>>> evacuation pauses, which are more critical. The Full GC log records 
>>>> are currently one line and I don't think we want to extend them 
>>>> further (at least, not before we put a decent GC logging framework 
>>>> in place). On the other hand, the snapshot region scanning aborts 
>>>> when a marking cycle is aborted due to a Full GC. So, this wait 
>>>> time should not be long. How about I add a comment in the code 
>>>> saying that, when we introduce a more extensible logging framework, 
>>>> we could add the wait time to the Full GC log records? Something like:
>>>>
>>>>    // Note: When we have a more flexible GC logging framework that
>>>>    // allows us to add optional attributes to a GC log record we
>>>>    // could consider timing and reporting how long we wait in the
>>>>    // following two methods.
>>>>     wait_while_free_regions_coming();
>>>>    // ...
>>>>    _cm->snapshot_regions()->wait_until_scan_finished();
>>>>
>>>>
>>>>> Finally, just some food for thought. Could this be generalized to 
>>>>> more roots? I mean take a snapshot and scan it concurrently.
>>>>
>>>> By scanning the IM snapshot regions we say "these guys are all 
>>>> roots, instead of scanning them during the GC we will scan them 
>>>> concurrently". And we can do that for any object / region a) as 
>>>> long as we know they will not move while they are being scanned and 
>>>> b) because we have the pre-barrier. If any references on the 
>>>> snapshot objects are updated, the pre-barrier ensures that their 
>>>> values at the start of marking will be enqueued and processed.
>>>>
>>>> For external arbitrary roots we don't have a write barrier (and we 
>>>> shouldn't as it'd be too expensive). So, we cannot do that for 
>>>> non-object roots without a "pre-barrier"-type mechanism.
>>>>
>>>> Tony
>>>>
>>>>
>>>>> Bengt
>>>>>
>>>>>
>>>>>
>>>>> On 2012-01-18 00:48, Tony Printezis wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Can I have a couple of code reviews for this change that 
>>>>>> re-enables the use of survivor regions during the initial-mark 
>>>>>> pause?
>>>>>>
>>>>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.0/
>>>>>>
>>>>>> From the CR:
>>>>>>
>>>>>> We could scan the survivors as we're copying them, however this 
>>>>>> will require more work during the initial-mark GCs (and in 
>>>>>> particular: special-case code in the fast path).
>>>>>>
>>>>>> A better approach is to let the concurrent marking threads scan 
>>>>>> the survivors and mark everything reachable from them a) before 
>>>>>> any more concurrent marking work is done (so that we can just 
>>>>>> mark the objects, without needing to push them on a stack, and 
>>>>>> let the "finger" algorithm discover them) and b) before the next 
>>>>>> GC starts (since, if we copy them, we won't know which of the new 
>>>>>> survivors are the ones we need to scan).
>>>>>>
>>>>>> This approach has the advantage that it does not require any 
>>>>>> extra work during the initial-mark GCs and all the work is done 
>>>>>> by the concurrent marking threads. However, it has the 
>>>>>> disadvantage that the survivor scanning might hold up the next 
>>>>>> GC. In most cases this should not be an issue as GCs take place 
>>>>>> at a reasonably low rate. If it does become a problem we could 
>>>>>> consider the following:
>>>>>>
>>>>>> - like when the GC locker is active, try to extend the eden to 
>>>>>> give a bit more time to the marking threads to finish scanning 
>>>>>> the survivors
>>>>>> - instead of waiting for the marking threads, a GC can take over 
>>>>>> and finish up scanning the remaining survivors (typically, we 
>>>>>> have more GC threads than marking threads, so the overhead will 
>>>>>> be reduced)
>>>>>> - if we supported region pinning, we could pin all the regions 
>>>>>> that were not scanned by the time the GC started so that the 
>>>>>> marking threads can resume scanning them after the GC completes
>>>>>>
>>>>>> Implementation notes:
>>>>>>
>>>>>> I introduced the concept of a "snapshot regions" in the 
>>>>>> ConcurrentMark which is a set of regions that need to be scanned 
>>>>>> at the start of a concurrent cycle. Currently, these can only be 
>>>>>> survivors but maybe we can use the same concept for something 
>>>>>> else in the future.
>>>>>>
>>>>>> Tony
>>>>>>
>>>>>>
>>>>>
>>>>
>>