CRR (M): 7127706: G1: re-enable survivors during the initial-mark pause
Tony Printezis
tony.printezis at oracle.com
Fri Jan 20 17:13:39 UTC 2012
Hi all,
New webrev for this based on comments from John:
http://cr.openjdk.java.net/~tonyp/7127706/webrev.3/
Now, the CMRootRegions::claim_next() method does not check the
ConcurrentMark::has_aborted() flag to know when to abort (and return
NULL) but, instead, a _has_aborted flag I added to the CMRootRegions
class. This is set to true at the start of a Full GC and before the Full
GC waits for the root region scan to finish.
Additionally, I now call _root_regions.reset() from
CM::checkpointRootsInitialPost()instead of calling it explicitly from
do_collection_pause_at_safepoint().
Tony
On 01/19/2012 09:40 AM, Tony Printezis wrote:
> Bengt (and all),
>
> Updated webrev using "root regions" now:
>
> http://cr.openjdk.java.net/~tonyp/7127706/webrev.2/
>
> Tony
>
> On 01/19/2012 09:04 AM, Tony Printezis wrote:
>> Bengt,
>>
>> Inline (again!)
>>
>> On 1/19/2012 4:03 AM, Bengt Rutisson wrote:
>>>
>>> Tony,
>>>
>>> On 2012-01-18 22:31, Tony Printezis wrote:
>>>> Bengt,
>>>>
>>>> Here's a webrev with the renaming:
>>>>
>>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.1/
>>>>
>>>> I have to say I'm not sure I really like the term "initial-mark /
>>>> IM snapshot regions". I'll try to come up with an alternative name
>>>> for them....
>>>
>>> Looked quickly at the new webrev.
>>>
>>> I agree that IM-snapshot might not be optimal. Still I like the fact
>>> that it is not just "snapshot" since I think that can easily be
>>> confused with the SATB terms.
>>
>> Well, it's supposed to be. In SATB, anything that's reachable from
>> the "snapshot" at initial-mark time will be retained. Since we want
>> to avoid explicitly marking the survivors, we make them all
>> implicitly live and part of the "snapshot" which is why we have to
>> scan them (the same way we scan roots during the initial-mark pause).
>> So, "snapshot regions" is not unreasonable given that if we were not
>> using SATB we would not be able to do this.
>>
>>> It is of course part of the SATB snapshot, but not the whole thing.
>>>
>>> Just thinking aloud here, what about not using the word "snapshot"
>>> at all? How about "to_be_scanned_regions", "root_regions" or
>>> "concurrent_roots"?
>>
>> I like "root regions". Let's go with that (it's shorter too!).
>>
>> Tony
>>
>>> Really not sure what a good name is here...I kind of like the log
>>> message "Concurrent root scanning took 0.000x ms".
>>>
>>> Bengt
>>>
>>>>
>>>> Tony
>>>>
>>>> Tony Printezis wrote:
>>>>> Hi Bengt,
>>>>>
>>>>> Thanks for looking at this so quickly! Inline.
>>>>>
>>>>> Bengt Rutisson wrote:
>>>>>>
>>>>>> Tony,
>>>>>>
>>>>>> Overall this looks really good. Thanks for fixing it.
>>>>>>
>>>>>> Some comments:
>>>>>>
>>>>>> First, a general question regarding naming and logging. We now
>>>>>> talk about "snapshot" a lot. It is a pretty good name, but maybe
>>>>>> it needs some more context to be understandable in the code and
>>>>>> the GC log. I don't have any really good names, but maybe
>>>>>> "survivor_snapshot"
>>>>>
>>>>> I'd rather not mention "survivors" given that we might add
>>>>> non-survivor regions in the future.
>>>>>
>>>>>> or "initial_mark_snapshot"?
>>>>>
>>>>> I like "initial-mark snapshot" better. Having said that
>>>>> CMInitialMarkSnapshotRegions and _initial_mark_snapshot_regions
>>>>> are kinda long. :-) I'll abbreviate to CMIMSnapshotRegions and
>>>>> _im_snapshot_regions if that's OK.
>>>>>
>>>>>> concurrentMark.inline.hpp
>>>>>>
>>>>>> if (hr == NULL) {
>>>>>> hr = _g1h->heap_region_containing_raw(addr);
>>>>>> // Given that we're looking for a region that contains an object
>>>>>> // header it's impossible to get back a HC region.
>>>>>> assert(!hr->continuesHumongous(), "sanity");
>>>>>> } else {
>>>>>> assert(hr->is_in(addr), "pre-condition");
>>>>>> }
>>>>>>
>>>>>> The first assert should probably hold even for regions that are
>>>>>> passed in to grayRoot() right? So, maybe something like:
>>>>>>
>>>>>> if (hr == NULL) {
>>>>>> hr = _g1h->heap_region_containing_raw(addr);
>>>>>> } else {
>>>>>> assert(hr->is_in(addr), "pre-condition");
>>>>>> }
>>>>>> // Given that we need a region that contains an object
>>>>>> // header it's impossible for it to be a HC region.
>>>>>> assert(!hr->continuesHumongous(), "sanity");
>>>>>
>>>>> Good observation! I changed to the above.
>>>>>
>>>>>> concurrentMarkThread.cpp
>>>>>>
>>>>>> ConcurrentMarkThread::run()
>>>>>>
>>>>>> Why do we do the explicit time/date stamping?
>>>>>>
>>>>>> gclog_or_tty->date_stamp(PrintGCDateStamps);
>>>>>> gclog_or_tty->stamp(PrintGCTimeStamps);
>>>>>> gclog_or_tty->print_cr("[GC
>>>>>> concurrent-snapshot-scan-start]");
>>>>>>
>>>>>> why is it not enough with the normal -XX:+PrintGCTimeStamps
>>>>>> information?
>>>>>
>>>>> Not quite sure what you mean with "is it not enough with the
>>>>> normal ... information". Each log record needs either a GC time
>>>>> stamp or a GC date stamp and we have to print either or both
>>>>> depending on the two -XX parameters. Unfortunately, the logging
>>>>> code has not been well abstracted and/or refactored so we have
>>>>> this unfortunate replication throughout the GCs.
>>>>>
>>>>>> This is probably correct since I see this pattern in other
>>>>>> places. But I would like to understand why we do it.
>>>>>>
>>>>>>
>>>>>> g1CollectedHeap.cpp:
>>>>>>
>>>>>> G1CollectedHeap::do_collection()
>>>>>>
>>>>>> Is it worth logging how long we had to wait in
>>>>>> _cm->snapshot_regions()->wait_until_scan_finished(), the same way
>>>>>> that we do in G1CollectedHeap::do_collection_pause_at_safepoint()?
>>>>>
>>>>> Currently, the GC log records for the evacuation pauses have a lot
>>>>> of extra information when +PrintGCDetails is set and it was
>>>>> reasonable to add an extra record with the wait time. And it's
>>>>> more important to know how the wait for snapshot region scanning
>>>>> affects evacuation pauses, which are more critical. The Full GC
>>>>> log records are currently one line and I don't think we want to
>>>>> extend them further (at least, not before we put a decent GC
>>>>> logging framework in place). On the other hand, the snapshot
>>>>> region scanning aborts when a marking cycle is aborted due to a
>>>>> Full GC. So, this wait time should not be long. How about I add a
>>>>> comment in the code saying that, when we introduce a more
>>>>> extensible logging framework, we could add the wait time to the
>>>>> Full GC log records? Something like:
>>>>>
>>>>> // Note: When we have a more flexible GC logging framework that
>>>>> // allows us to add optional attributes to a GC log record we
>>>>> // could consider timing and reporting how long we wait in the
>>>>> // following two methods.
>>>>> wait_while_free_regions_coming();
>>>>> // ...
>>>>> _cm->snapshot_regions()->wait_until_scan_finished();
>>>>>
>>>>>
>>>>>> Finally, just some food for thought. Could this be generalized to
>>>>>> more roots? I mean take a snapshot and scan it concurrently.
>>>>>
>>>>> By scanning the IM snapshot regions we say "these guys are all
>>>>> roots, instead of scanning them during the GC we will scan them
>>>>> concurrently". And we can do that for any object / region a) as
>>>>> long as we know they will not move while they are being scanned
>>>>> and b) because we have the pre-barrier. If any references on the
>>>>> snapshot objects are updated, the pre-barrier ensures that their
>>>>> values at the start of marking will be enqueued and processed.
>>>>>
>>>>> For external arbitrary roots we don't have a write barrier (and we
>>>>> shouldn't as it'd be too expensive). So, we cannot do that for
>>>>> non-object roots without a "pre-barrier"-type mechanism.
>>>>>
>>>>> Tony
>>>>>
>>>>>
>>>>>> Bengt
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2012-01-18 00:48, Tony Printezis wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Can I have a couple of code reviews for this change that
>>>>>>> re-enables the use of survivor regions during the initial-mark
>>>>>>> pause?
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~tonyp/7127706/webrev.0/
>>>>>>>
>>>>>>> From the CR:
>>>>>>>
>>>>>>> We could scan the survivors as we're copying them, however this
>>>>>>> will require more work during the initial-mark GCs (and in
>>>>>>> particular: special-case code in the fast path).
>>>>>>>
>>>>>>> A better approach is to let the concurrent marking threads scan
>>>>>>> the survivors and mark everything reachable from them a) before
>>>>>>> any more concurrent marking work is done (so that we can just
>>>>>>> mark the objects, without needing to push them on a stack, and
>>>>>>> let the "finger" algorithm discover them) and b) before the next
>>>>>>> GC starts (since, if we copy them, we won't know which of the
>>>>>>> new survivors are the ones we need to scan).
>>>>>>>
>>>>>>> This approach has the advantage that it does not require any
>>>>>>> extra work during the initial-mark GCs and all the work is done
>>>>>>> by the concurrent marking threads. However, it has the
>>>>>>> disadvantage that the survivor scanning might hold up the next
>>>>>>> GC. In most cases this should not be an issue as GCs take place
>>>>>>> at a reasonably low rate. If it does become a problem we could
>>>>>>> consider the following:
>>>>>>>
>>>>>>> - like when the GC locker is active, try to extend the eden to
>>>>>>> give a bit more time to the marking threads to finish scanning
>>>>>>> the survivors
>>>>>>> - instead of waiting for the marking threads, a GC can take over
>>>>>>> and finish up scanning the remaining survivors (typically, we
>>>>>>> have more GC threads than marking threads, so the overhead will
>>>>>>> be reduced)
>>>>>>> - if we supported region pinning, we could pin all the regions
>>>>>>> that were not scanned by the time the GC started so that the
>>>>>>> marking threads can resume scanning them after the GC completes
>>>>>>>
>>>>>>> Implementation notes:
>>>>>>>
>>>>>>> I introduced the concept of a "snapshot regions" in the
>>>>>>> ConcurrentMark which is a set of regions that need to be scanned
>>>>>>> at the start of a concurrent cycle. Currently, these can only be
>>>>>>> survivors but maybe we can use the same concept for something
>>>>>>> else in the future.
>>>>>>>
>>>>>>> Tony
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
More information about the hotspot-gc-dev
mailing list