CRR (M): 7127706: G1: re-enable survivors during the initial-mark pause
Tony Printezis
tony.printezis at oracle.com
Thu Jan 19 14:40:37 UTC 2012
Bengt (and all),
Updated webrev using "root regions" now:
http://cr.openjdk.java.net/~tonyp/7127706/webrev.2/
Tony
On 01/19/2012 09:04 AM, Tony Printezis wrote:
> Bengt,
>
> Inline (again!)
>
> On 1/19/2012 4:03 AM, Bengt Rutisson wrote:
>>
>> Tony,
>>
>> On 2012-01-18 22:31, Tony Printezis wrote:
>>> Bengt,
>>>
>>> Here's a webrev with the renaming:
>>>
>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.1/
>>>
>>> I have to say I'm not sure I really like the term "initial-mark / IM
>>> snapshot regions". I'll try to come up with an alternative name for
>>> them....
>>
>> Looked quickly at the new webrev.
>>
>> I agree that IM-snapshot might not be optimal. Still I like the fact
>> that it is not just "snapshot" since I think that can easily be
>> confused with the SATB terms.
>
> Well, it's supposed to be. In SATB, anything that's reachable from the
> "snapshot" at initial-mark time will be retained. Since we want to
> avoid explicitly marking the survivors, we make them all implicitly
> live and part of the "snapshot" which is why we have to scan them (the
> same way we scan roots during the initial-mark pause). So, "snapshot
> regions" is not unreasonable given that if we were not using SATB we
> would not be able to do this.
>
>> It is of course part of the SATB snapshot, but not the whole thing.
>>
>> Just thinking aloud here, what about not using the word "snapshot" at
>> all? How about "to_be_scanned_regions", "root_regions" or
>> "concurrent_roots"?
>
> I like "root regions". Let's go with that (it's shorter too!).
>
> Tony
>
>> Really not sure what a good name is here...I kind of like the log
>> message "Concurrent root scanning took 0.000x ms".
>>
>> Bengt
>>
>>>
>>> Tony
>>>
>>> Tony Printezis wrote:
>>>> Hi Bengt,
>>>>
>>>> Thanks for looking at this so quickly! Inline.
>>>>
>>>> Bengt Rutisson wrote:
>>>>>
>>>>> Tony,
>>>>>
>>>>> Overall this looks really good. Thanks for fixing it.
>>>>>
>>>>> Some comments:
>>>>>
>>>>> First, a general question regarding naming and logging. We now
>>>>> talk about "snapshot" a lot. It is a pretty good name, but maybe
>>>>> it needs some more context to be understandable in the code and
>>>>> the GC log. I don't have any really good names, but maybe
>>>>> "survivor_snapshot"
>>>>
>>>> I'd rather not mention "survivors" given that we might add
>>>> non-survivor regions in the future.
>>>>
>>>>> or "initial_mark_snapshot"?
>>>>
>>>> I like "initial-mark snapshot" better. Having said that
>>>> CMInitialMarkSnapshotRegions and _initial_mark_snapshot_regions are
>>>> kinda long. :-) I'll abbreviate to CMIMSnapshotRegions and
>>>> _im_snapshot_regions if that's OK.
>>>>
>>>>> concurrentMark.inline.hpp
>>>>>
>>>>> if (hr == NULL) {
>>>>> hr = _g1h->heap_region_containing_raw(addr);
>>>>> // Given that we're looking for a region that contains an object
>>>>> // header it's impossible to get back a HC region.
>>>>> assert(!hr->continuesHumongous(), "sanity");
>>>>> } else {
>>>>> assert(hr->is_in(addr), "pre-condition");
>>>>> }
>>>>>
>>>>> The first assert should probably hold even for regions that are
>>>>> passed in to grayRoot() right? So, maybe something like:
>>>>>
>>>>> if (hr == NULL) {
>>>>> hr = _g1h->heap_region_containing_raw(addr);
>>>>> } else {
>>>>> assert(hr->is_in(addr), "pre-condition");
>>>>> }
>>>>> // Given that we need a region that contains an object
>>>>> // header it's impossible for it to be a HC region.
>>>>> assert(!hr->continuesHumongous(), "sanity");
>>>>
>>>> Good observation! I changed to the above.
>>>>
>>>>> concurrentMarkThread.cpp
>>>>>
>>>>> ConcurrentMarkThread::run()
>>>>>
>>>>> Why do we do the explicit time/date stamping?
>>>>>
>>>>> gclog_or_tty->date_stamp(PrintGCDateStamps);
>>>>> gclog_or_tty->stamp(PrintGCTimeStamps);
>>>>> gclog_or_tty->print_cr("[GC
>>>>> concurrent-snapshot-scan-start]");
>>>>>
>>>>> why is it not enough with the normal -XX:+PrintGCTimeStamps
>>>>> information?
>>>>
>>>> Not quite sure what you mean with "is it not enough with the normal
>>>> ... information". Each log record needs either a GC time stamp or a
>>>> GC date stamp and we have to print either or both depending on the
>>>> two -XX parameters. Unfortunately, the logging code has not been
>>>> well abstracted and/or refactored so we have this unfortunate
>>>> replication throughout the GCs.
>>>>
>>>>> This is probably correct since I see this pattern in other places.
>>>>> But I would like to understand why we do it.
>>>>>
>>>>>
>>>>> g1CollectedHeap.cpp:
>>>>>
>>>>> G1CollectedHeap::do_collection()
>>>>>
>>>>> Is it worth logging how long we had to wait in
>>>>> _cm->snapshot_regions()->wait_until_scan_finished(), the same way
>>>>> that we do in G1CollectedHeap::do_collection_pause_at_safepoint()?
>>>>
>>>> Currently, the GC log records for the evacuation pauses have a lot
>>>> of extra information when +PrintGCDetails is set and it was
>>>> reasonable to add an extra record with the wait time. And it's more
>>>> important to know how the wait for snapshot region scanning affects
>>>> evacuation pauses, which are more critical. The Full GC log records
>>>> are currently one line and I don't think we want to extend them
>>>> further (at least, not before we put a decent GC logging framework
>>>> in place). On the other hand, the snapshot region scanning aborts
>>>> when a marking cycle is aborted due to a Full GC. So, this wait
>>>> time should not be long. How about I add a comment in the code
>>>> saying that, when we introduce a more extensible logging framework,
>>>> we could add the wait time to the Full GC log records? Something like:
>>>>
>>>> // Note: When we have a more flexible GC logging framework that
>>>> // allows us to add optional attributes to a GC log record we
>>>> // could consider timing and reporting how long we wait in the
>>>> // following two methods.
>>>> wait_while_free_regions_coming();
>>>> // ...
>>>> _cm->snapshot_regions()->wait_until_scan_finished();
>>>>
>>>>
>>>>> Finally, just some food for thought. Could this be generalized to
>>>>> more roots? I mean take a snapshot and scan it concurrently.
>>>>
>>>> By scanning the IM snapshot regions we say "these guys are all
>>>> roots, instead of scanning them during the GC we will scan them
>>>> concurrently". And we can do that for any object / region a) as
>>>> long as we know they will not move while they are being scanned and
>>>> b) because we have the pre-barrier. If any references on the
>>>> snapshot objects are updated, the pre-barrier ensures that their
>>>> values at the start of marking will be enqueued and processed.
>>>>
>>>> For external arbitrary roots we don't have a write barrier (and we
>>>> shouldn't as it'd be too expensive). So, we cannot do that for
>>>> non-object roots without a "pre-barrier"-type mechanism.
>>>>
>>>> Tony
>>>>
>>>>
>>>>> Bengt
>>>>>
>>>>>
>>>>>
>>>>> On 2012-01-18 00:48, Tony Printezis wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Can I have a couple of code reviews for this change that
>>>>>> re-enables the use of survivor regions during the initial-mark
>>>>>> pause?
>>>>>>
>>>>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.0/
>>>>>>
>>>>>> From the CR:
>>>>>>
>>>>>> We could scan the survivors as we're copying them, however this
>>>>>> will require more work during the initial-mark GCs (and in
>>>>>> particular: special-case code in the fast path).
>>>>>>
>>>>>> A better approach is to let the concurrent marking threads scan
>>>>>> the survivors and mark everything reachable from them a) before
>>>>>> any more concurrent marking work is done (so that we can just
>>>>>> mark the objects, without needing to push them on a stack, and
>>>>>> let the "finger" algorithm discover them) and b) before the next
>>>>>> GC starts (since, if we copy them, we won't know which of the new
>>>>>> survivors are the ones we need to scan).
>>>>>>
>>>>>> This approach has the advantage that it does not require any
>>>>>> extra work during the initial-mark GCs and all the work is done
>>>>>> by the concurrent marking threads. However, it has the
>>>>>> disadvantage that the survivor scanning might hold up the next
>>>>>> GC. In most cases this should not be an issue as GCs take place
>>>>>> at a reasonably low rate. If it does become a problem we could
>>>>>> consider the following:
>>>>>>
>>>>>> - like when the GC locker is active, try to extend the eden to
>>>>>> give a bit more time to the marking threads to finish scanning
>>>>>> the survivors
>>>>>> - instead of waiting for the marking threads, a GC can take over
>>>>>> and finish up scanning the remaining survivors (typically, we
>>>>>> have more GC threads than marking threads, so the overhead will
>>>>>> be reduced)
>>>>>> - if we supported region pinning, we could pin all the regions
>>>>>> that were not scanned by the time the GC started so that the
>>>>>> marking threads can resume scanning them after the GC completes
>>>>>>
>>>>>> Implementation notes:
>>>>>>
>>>>>> I introduced the concept of a "snapshot regions" in the
>>>>>> ConcurrentMark which is a set of regions that need to be scanned
>>>>>> at the start of a concurrent cycle. Currently, these can only be
>>>>>> survivors but maybe we can use the same concept for something
>>>>>> else in the future.
>>>>>>
>>>>>> Tony
>>>>>>
>>>>>>
>>>>>
>>>>
>>
More information about the hotspot-gc-dev
mailing list