CRR (M): 7127706: G1: re-enable survivors during the initial-mark pause

Wed Jan 25 17:11:15 UTC 2012

Hi Bengt,

Thanks for looking at this once more. Inline.

On 01/25/2012 09:02 AM, Bengt Rutisson wrote:
>
> Tony,
>
> This looks good.
>
> One minor thing. Is it possible to rename CMRootRegions::reset() ? 
> From the name I would have expected it to kind of set its fields to 
> NULL or something similar. But what it does is set up the CMRootRegion 
> for the upcoming concurrent scanning. How about CMRootRegions::setup() 
> or CMRootRegions::prepare() instead?

I changed it to prepare_for_scan()

> Just a reminder (I guess you will remember to do this anyway):
>
> concurrentMark.inline.hpp
>  // TODO: make sure we pass hr to par_mark_and_count() after merging
>   // with John's changes.

Yep, merged with John's changes and that comment is now gone.

> Also, about the prefetching in void ConcurrentMark::scanRootRegion(). 
> Did you measure any performance impact of this? Why did you decide to 
> include it?

I have to say I didn't do any performance measurements on this. But, 
typically, when iterating over a large part of the heap, prefetching 
pays off.

Tony

> Bengt
>
> On 2012-01-20 18:13, Tony Printezis wrote:
>> Hi all,
>>
>> New webrev for this based on comments from John:
>>
>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.3/
>>
>> Now, the CMRootRegions::claim_next() method does not check the 
>> ConcurrentMark::has_aborted() flag to know when to abort (and return 
>> NULL) but, instead, a _has_aborted flag I added to the CMRootRegions 
>> class. This is set to true at the start of a Full GC and before the 
>> Full GC waits for the root region scan to finish.
>>
>> Additionally, I now call _root_regions.reset() from 
>> CM::checkpointRootsInitialPost()instead of calling it explicitly from 
>> do_collection_pause_at_safepoint().
>>
>> Tony
>>
>> On 01/19/2012 09:40 AM, Tony Printezis wrote:
>>> Bengt (and all),
>>>
>>> Updated webrev using "root regions" now:
>>>
>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.2/
>>>
>>> Tony
>>>
>>> On 01/19/2012 09:04 AM, Tony Printezis wrote:
>>>> Bengt,
>>>>
>>>> Inline (again!)
>>>>
>>>> On 1/19/2012 4:03 AM, Bengt Rutisson wrote:
>>>>>
>>>>> Tony,
>>>>>
>>>>> On 2012-01-18 22:31, Tony Printezis wrote:
>>>>>> Bengt,
>>>>>>
>>>>>> Here's a webrev with the renaming:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.1/
>>>>>>
>>>>>> I have to say I'm not sure I really like the term "initial-mark / 
>>>>>> IM snapshot regions". I'll try to come up with an alternative 
>>>>>> name for them....
>>>>>
>>>>> Looked quickly at the new webrev.
>>>>>
>>>>> I agree that IM-snapshot might not be optimal. Still I like the 
>>>>> fact that it is not just "snapshot" since I think that can easily 
>>>>> be confused with the SATB terms. 
>>>>
>>>> Well, it's supposed to be. In SATB, anything that's reachable from 
>>>> the "snapshot" at initial-mark time will be retained. Since we want 
>>>> to avoid explicitly marking the survivors, we make them all 
>>>> implicitly live and part of the "snapshot" which is why we have to 
>>>> scan them (the same way we scan roots during the initial-mark 
>>>> pause). So, "snapshot regions" is not unreasonable given that if we 
>>>> were not using SATB we would not be able to do this.
>>>>
>>>>> It is of course part of the SATB snapshot, but not the whole thing.
>>>>>
>>>>> Just thinking aloud here, what about not using the word "snapshot" 
>>>>> at all? How about "to_be_scanned_regions", "root_regions" or 
>>>>> "concurrent_roots"?
>>>>
>>>> I like "root regions". Let's go with that (it's shorter too!).
>>>>
>>>> Tony
>>>>
>>>>> Really not sure what a good name is here...I kind of like the log 
>>>>> message "Concurrent root scanning took 0.000x ms".
>>>>>
>>>>> Bengt
>>>>>
>>>>>>
>>>>>> Tony
>>>>>>
>>>>>> Tony Printezis wrote:
>>>>>>> Hi Bengt,
>>>>>>>
>>>>>>> Thanks for looking at this so quickly! Inline.
>>>>>>>
>>>>>>> Bengt Rutisson wrote:
>>>>>>>>
>>>>>>>> Tony,
>>>>>>>>
>>>>>>>> Overall this looks really good. Thanks for fixing it.
>>>>>>>>
>>>>>>>> Some comments:
>>>>>>>>
>>>>>>>> First, a general question regarding naming and logging. We now 
>>>>>>>> talk about "snapshot" a lot. It is a pretty good name, but 
>>>>>>>> maybe it needs some more context to be understandable in the 
>>>>>>>> code and the GC log. I don't have any really good names, but 
>>>>>>>> maybe "survivor_snapshot"
>>>>>>>
>>>>>>> I'd rather not mention "survivors" given that we might add 
>>>>>>> non-survivor regions in the future.
>>>>>>>
>>>>>>>> or "initial_mark_snapshot"?
>>>>>>>
>>>>>>> I like "initial-mark snapshot" better. Having said that 
>>>>>>> CMInitialMarkSnapshotRegions and _initial_mark_snapshot_regions 
>>>>>>> are kinda long. :-) I'll abbreviate to CMIMSnapshotRegions and 
>>>>>>> _im_snapshot_regions if that's OK.
>>>>>>>
>>>>>>>> concurrentMark.inline.hpp
>>>>>>>>
>>>>>>>>   if (hr == NULL) {
>>>>>>>>     hr = _g1h->heap_region_containing_raw(addr);
>>>>>>>>     // Given that we're looking for a region that contains an 
>>>>>>>> object
>>>>>>>>     // header it's impossible to get back a HC region.
>>>>>>>>     assert(!hr->continuesHumongous(), "sanity");
>>>>>>>>   } else {
>>>>>>>>     assert(hr->is_in(addr), "pre-condition");
>>>>>>>>   }
>>>>>>>>
>>>>>>>> The first assert should probably hold even for regions that are 
>>>>>>>> passed in to grayRoot() right? So, maybe something like:
>>>>>>>>
>>>>>>>>   if (hr == NULL) {
>>>>>>>>     hr = _g1h->heap_region_containing_raw(addr);
>>>>>>>>   } else {
>>>>>>>>     assert(hr->is_in(addr), "pre-condition");
>>>>>>>>   }
>>>>>>>>   // Given that we need a region that contains an object
>>>>>>>>   // header it's impossible for it to be a HC region.
>>>>>>>>   assert(!hr->continuesHumongous(), "sanity");
>>>>>>>
>>>>>>> Good observation! I changed to the above.
>>>>>>>
>>>>>>>> concurrentMarkThread.cpp
>>>>>>>>
>>>>>>>> ConcurrentMarkThread::run()
>>>>>>>>
>>>>>>>> Why do we do the explicit time/date stamping?
>>>>>>>>
>>>>>>>>           gclog_or_tty->date_stamp(PrintGCDateStamps);
>>>>>>>>           gclog_or_tty->stamp(PrintGCTimeStamps);
>>>>>>>>           gclog_or_tty->print_cr("[GC 
>>>>>>>> concurrent-snapshot-scan-start]");
>>>>>>>>
>>>>>>>> why is it not enough with the normal -XX:+PrintGCTimeStamps 
>>>>>>>> information? 
>>>>>>>
>>>>>>> Not quite sure what you mean with "is it not enough with the 
>>>>>>> normal ... information". Each log record needs either a GC time 
>>>>>>> stamp or a GC date stamp and we have to print either or both 
>>>>>>> depending on the two -XX parameters. Unfortunately, the logging 
>>>>>>> code has not been well abstracted and/or refactored so we have 
>>>>>>> this unfortunate replication throughout the GCs.
>>>>>>>
>>>>>>>> This is probably correct since I see this pattern in other 
>>>>>>>> places. But I would like to understand why we do it.
>>>>>>>>
>>>>>>>>
>>>>>>>> g1CollectedHeap.cpp:
>>>>>>>>
>>>>>>>> G1CollectedHeap::do_collection()
>>>>>>>>
>>>>>>>> Is it worth logging how long we had to wait in 
>>>>>>>> _cm->snapshot_regions()->wait_until_scan_finished(), the same 
>>>>>>>> way that we do in 
>>>>>>>> G1CollectedHeap::do_collection_pause_at_safepoint()?
>>>>>>>
>>>>>>> Currently, the GC log records for the evacuation pauses have a 
>>>>>>> lot of extra information when +PrintGCDetails is set and it was 
>>>>>>> reasonable to add an extra record with the wait time. And it's 
>>>>>>> more important to know how the wait for snapshot region scanning 
>>>>>>> affects evacuation pauses, which are more critical. The Full GC 
>>>>>>> log records are currently one line and I don't think we want to 
>>>>>>> extend them further (at least, not before we put a decent GC 
>>>>>>> logging framework in place). On the other hand, the snapshot 
>>>>>>> region scanning aborts when a marking cycle is aborted due to a 
>>>>>>> Full GC. So, this wait time should not be long. How about I add 
>>>>>>> a comment in the code saying that, when we introduce a more 
>>>>>>> extensible logging framework, we could add the wait time to the 
>>>>>>> Full GC log records? Something like:
>>>>>>>
>>>>>>>    // Note: When we have a more flexible GC logging framework that
>>>>>>>    // allows us to add optional attributes to a GC log record we
>>>>>>>    // could consider timing and reporting how long we wait in the
>>>>>>>    // following two methods.
>>>>>>>     wait_while_free_regions_coming();
>>>>>>>    // ...
>>>>>>>    _cm->snapshot_regions()->wait_until_scan_finished();
>>>>>>>
>>>>>>>
>>>>>>>> Finally, just some food for thought. Could this be generalized 
>>>>>>>> to more roots? I mean take a snapshot and scan it concurrently.
>>>>>>>
>>>>>>> By scanning the IM snapshot regions we say "these guys are all 
>>>>>>> roots, instead of scanning them during the GC we will scan them 
>>>>>>> concurrently". And we can do that for any object / region a) as 
>>>>>>> long as we know they will not move while they are being scanned 
>>>>>>> and b) because we have the pre-barrier. If any references on the 
>>>>>>> snapshot objects are updated, the pre-barrier ensures that their 
>>>>>>> values at the start of marking will be enqueued and processed.
>>>>>>>
>>>>>>> For external arbitrary roots we don't have a write barrier (and 
>>>>>>> we shouldn't as it'd be too expensive). So, we cannot do that 
>>>>>>> for non-object roots without a "pre-barrier"-type mechanism.
>>>>>>>
>>>>>>> Tony
>>>>>>>
>>>>>>>
>>>>>>>> Bengt
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2012-01-18 00:48, Tony Printezis wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Can I have a couple of code reviews for this change that 
>>>>>>>>> re-enables the use of survivor regions during the initial-mark 
>>>>>>>>> pause?
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.0/
>>>>>>>>>
>>>>>>>>> From the CR:
>>>>>>>>>
>>>>>>>>> We could scan the survivors as we're copying them, however 
>>>>>>>>> this will require more work during the initial-mark GCs (and 
>>>>>>>>> in particular: special-case code in the fast path).
>>>>>>>>>
>>>>>>>>> A better approach is to let the concurrent marking threads 
>>>>>>>>> scan the survivors and mark everything reachable from them a) 
>>>>>>>>> before any more concurrent marking work is done (so that we 
>>>>>>>>> can just mark the objects, without needing to push them on a 
>>>>>>>>> stack, and let the "finger" algorithm discover them) and b) 
>>>>>>>>> before the next GC starts (since, if we copy them, we won't 
>>>>>>>>> know which of the new survivors are the ones we need to scan).
>>>>>>>>>
>>>>>>>>> This approach has the advantage that it does not require any 
>>>>>>>>> extra work during the initial-mark GCs and all the work is 
>>>>>>>>> done by the concurrent marking threads. However, it has the 
>>>>>>>>> disadvantage that the survivor scanning might hold up the next 
>>>>>>>>> GC. In most cases this should not be an issue as GCs take 
>>>>>>>>> place at a reasonably low rate. If it does become a problem we 
>>>>>>>>> could consider the following:
>>>>>>>>>
>>>>>>>>> - like when the GC locker is active, try to extend the eden to 
>>>>>>>>> give a bit more time to the marking threads to finish scanning 
>>>>>>>>> the survivors
>>>>>>>>> - instead of waiting for the marking threads, a GC can take 
>>>>>>>>> over and finish up scanning the remaining survivors 
>>>>>>>>> (typically, we have more GC threads than marking threads, so 
>>>>>>>>> the overhead will be reduced)
>>>>>>>>> - if we supported region pinning, we could pin all the regions 
>>>>>>>>> that were not scanned by the time the GC started so that the 
>>>>>>>>> marking threads can resume scanning them after the GC completes
>>>>>>>>>
>>>>>>>>> Implementation notes:
>>>>>>>>>
>>>>>>>>> I introduced the concept of a "snapshot regions" in the 
>>>>>>>>> ConcurrentMark which is a set of regions that need to be 
>>>>>>>>> scanned at the start of a concurrent cycle. Currently, these 
>>>>>>>>> can only be survivors but maybe we can use the same concept 
>>>>>>>>> for something else in the future.
>>>>>>>>>
>>>>>>>>> Tony
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>