CRR (L / updated): 6888336: G1: avoid explicitly marking and pushing objects in survivor spaces
Tony Printezis
tony.printezis at oracle.com
Tue Jan 10 23:57:21 UTC 2012
Bengt,
Hi, thanks for looking at it! See inline.
On 01/10/2012 04:41 PM, Bengt Rutisson wrote:
>
> Hi Tony,
>
> I think this looks good. Ship it!
>
> A couple of coding style questions:
>
> In g1CollectorPolicy.cpp, lines 908-920 there is this if statement:
>
> if (!during_initial_mark_pause()) {
> ...
> } else {
> ...
> }
>
> I realize that "!during_initial_mark_pause()" is the more common case,
> but to me it seems more natural to avoid having a negation in the
> test, so I'd swap the if and else blocks. What's your opinion on this?
>
As you pointed out I tend to put the common case in the if-block even if
I have to negate the condition. This is a nice unambiguous way to
structure the if-statement (well, assuming we can unambiguously decide
what the "common" case is!). I'd like to leave this as is. If it makes
you feel any better: I already have the changes to re-enable survivors
so that test will disappear soon-ish. :-)
> In g1OopClosures.hpp you swapped the lines 151 and 152, which makes
> it look like this:
>
> 149 G1ParCopyClosure(G1CollectedHeap* g1, G1ParScanThreadState*
> par_scan_state,
> 150 ReferenceProcessor* rp) :
> 151 G1ParCopyHelper(g1, par_scan_state, &_scanner),
> 152 _scanner(g1, par_scan_state, rp) {
>
> I guess you want the call to the super class constructor before other
> initialization. To me it looks strange that the _scanner is passed to
> the super class, but is now actually not initialized until after the
> call to the super constructor. I would have preferred the order as it
> was before. I guess the new order works fine as long as the super
> constructor never tries to use the _scanner.
You're right. I wanted the constructor to be called first but I was
clearly a bit careless and I missed the dependency. Thanks for looking
at this carefully! Even though the dependency is benign here (the ref to
_scanner is only stored locally in the super class) I think we should
avoid and future surprises. So I'll undo the change.
I'll push this asap.
Tony
> Anyway, just details. All in all it looks great!
>
> Bengt
>
>
> On 2012-01-07 05:43, Tony Printezis wrote:
>> Hi again,
>>
>> Here's an updated webrev after I merged my changes with John's latest
>> push:
>>
>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.3/
>>
>> The code is basically the same, I just had to move some of it to
>> different places.
>>
>> Testing update: I've been testing the changes continuously on three
>> machines over the holidays and I haven't seen any failures since the
>> single failure over Xmas which was caused by the race in the array
>> chunking changes which has now been resolved. I did additional
>> testing with a patch from John (thanks again!) which artificially
>> forces evacuation failures to stress that code and, again, I saw no
>> issues with that either.
>>
>> Tony
>>
>> On 01/05/2012 03:17 PM, Tony Printezis wrote:
>>> Hi all,
>>>
>>> Updated webrev after making some changes based on comments from John
>>> (thanks John!):
>>>
>>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.2/
>>>
>>> I'd like to clarify something: this change relies on the array
>>> chunking changes (7121623) but the webrev does not include those
>>> changes (despite what the index page says). So, if you want to try
>>> this patch out you'll need to apply the array chunking changes first.
>>>
>>> Tony
>>>
>>> On 12/27/2011 01:05 PM, Tony Printezis wrote:
>>>> Hi all,
>>>>
>>>> Here's an updated webrev for this change that takes into account
>>>> the new approach of chunking object arrays (see previous e-mails on
>>>> 7121623):
>>>>
>>>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.1/
>>>>
>>>> If anything else the new approach simplified the code a bit since
>>>> now we can always read an object's size from its from-image instead
>>>> of having to check one or the other depending on whether it's a
>>>> chunked array or not. I also moved the body of some methods from
>>>> heapRegion.hpp to the .inline.hpp and .cpp files (as they were
>>>> getting a bit large to keep in the .hpp file).
>>>>
>>>> Tony
>>>>
>>>> On 12/21/2011 05:37 PM, Tony Printezis wrote:
>>>>> Hi all,
>>>>>
>>>>> I'd like a couple of code reviews for the following non-trivial
>>>>> changes (large, not necessary in lines of code modified but more
>>>>> due to the fact that the evacuation pause / concurrent marking
>>>>> interaction is changed quite dramatically):
>>>>>
>>>>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.0/
>>>>>
>>>>> Here's some background, motivation, and a summary of the changes
>>>>> (I felt that it was important to write a longer then usual
>>>>> explanation):
>>>>>
>>>>> * Background / Motivation
>>>>>
>>>>> Each G1 heap region has a field top-at-mark-start (aka TAMS) which
>>>>> denotes where the top of the region was when marking started. An
>>>>> object is considered implicitly live if it's over TAMS (i.e., it
>>>>> was allocated since marking started) or explicitly live if it's
>>>>> below TAMS (i.e., it was allocated before marking started) and
>>>>> marked on the bitmap. (It follows that it's unnecessary to
>>>>> explicitly mark objects over TAMS.)
>>>>>
>>>>> In fact, we have two copies of the above marking information:
>>>>> "Next TAMS / Next Bitmap" and "Prev TAMS / Prev Bitmap". Prev is
>>>>> the copy that was obtained by the last marking cycle that was
>>>>> successfully completed (so, it is consistent: all live objects
>>>>> should appear as live in the prev marking information). Next is
>>>>> the copy that will be obtained / is currently being obtained and
>>>>> it's not consistent because it's not guaranteed to be complete.
>>>>>
>>>>> G1 uses SATB marking which has the advantage not to require
>>>>> objects allocated since the start of marking to be visited at all
>>>>> by the marking threads (they are implicitly live and they do not
>>>>> need to be scanned). So, the active marking cycle can totally
>>>>> ignore objects over NTAMS (since they have been allocated since
>>>>> marking started).
>>>>>
>>>>> The current interaction between evacuation pauses (let's call
>>>>> these "GCs" from now on) and concurrent marking is very tricky.
>>>>> Even though marking ignores all objects over NTAMS (currently: all
>>>>> objects in Eden regions) it still has to visit and mark objects in
>>>>> the Survivors regions. But those will be moved by subsequent GCs.
>>>>> So, a GC needs to be aware that it's moving objects that have been
>>>>> marked by the marking threads and not only propagate those marks
>>>>> but also notify the marking threads that said objects have been
>>>>> moved. For that we use several data structures: pushes to the
>>>>> global marking stack and also to what's referred to as the "region
>>>>> stack" which is only used by the GC to push a group of objects
>>>>> instead of pushing them individually ("region" here is a mem
>>>>> region and smaller than a G1 region).
>>>>>
>>>>> Additionally, because the marking threads could come across
>>>>> objects that could potentially move we have to make sure that we
>>>>> don't leave references to regions that have been evacuated on any
>>>>> marking data structure. To do that we treat as roots all entries
>>>>> on the taskqueues / global stack and drained all SATB buffers
>>>>> (both active buffers and also enqueued buffers).
>>>>>
>>>>> The first issue with the above interaction is that it has
>>>>> performance issues. Draining all SATB buffers and scanning the
>>>>> mark stack and taskqueues has been shown to be very time-consuming
>>>>> in some cases. Also, having to check whether objects are marked
>>>>> and propagate the marks appropriately during GC is an extra overhead.
>>>>>
>>>>> The second issue is that it has been shown to be very fragile. We
>>>>> have discovered and fixed many issues over time which were subtle
>>>>> and hard to reproduce.
>>>>>
>>>>> We really need to simplify the GC/marking interaction to both
>>>>> improve performance of GCs during marking, as well as improve our
>>>>> reliability. This changeset does exactly that.
>>>>>
>>>>> * Explanation of the changes
>>>>>
>>>>> The goal is to ensure that all the objects that are copied by the
>>>>> GC do not need to be visited by the marking threads and as a
>>>>> result do not need to be explicitly marked, pushed, etc.
>>>>>
>>>>> The first observation is that most objects copied during a GC are
>>>>> allocated after marking starts and are therefore implicitly live.
>>>>> This is the case for all objects on Eden regions, as well as most
>>>>> objects on Survivor regions. The only exception are objects on the
>>>>> Survivor regions during the initial-mark pause. Unfortunately,
>>>>> it's not easy to track those separately as they will get mixed in
>>>>> with future Survivors. The first decision to deal with this is to
>>>>> turn off Survivors during the initial-mark pause. This ensures
>>>>> that all objects copied during each subsequent GC will only visit
>>>>> objects that have been allocated since marking started and are
>>>>> therefore implicitly live (i.e., over NTAMS). This allows us to
>>>>> totally eliminate that code that propagates marks during the GC.
>>>>> We just have to make sure that all copied objects are over NTAMS.
>>>>> Turning off Survivors during an initial-mark pause is a bit of a
>>>>> "big hammer" approach, but it will suffice for now. We have ideas
>>>>> on how to re-enable them in the future and we'll explore a couple
>>>>> of alternatives.
>>>>>
>>>>> Given that the GC only copies objects that are implicitly marked
>>>>> it follows that none of the objects that are copied during any GC
>>>>> should appear on either the taskqueues nor the global marking
>>>>> stack. Also remember that we filter SATB buffers before enqueueing
>>>>> them which will filter out all implicitly marked objects. It
>>>>> follows that no enqueued SATB buffer should have references to
>>>>> objects that are being moved. This leaves the currently active
>>>>> SATB buffers given that the code that populates them is
>>>>> unconditional. But if we run the filtering on those during each GC
>>>>> such "offending" references are also quickly eliminated. So,
>>>>> instead of having to scan all stacks and all SATB buffers we only
>>>>> have to filter the active SATB buffers, which should be much, much
>>>>> faster.
>>>>>
>>>>> * Implementation Notes
>>>>>
>>>>> The actual changes are not too extensive as they basically mostly
>>>>> disable functionality in the GC code. The tricky part was to get
>>>>> the TAMS fields correct at various phases (start of copying, start
>>>>> of marking, etc.) and especially when an evacuation failure
>>>>> occurs. I put all that functionality in methods on HeapRegion
>>>>> which do the right thing when a GC starts, a marking starts, etc.
>>>>>
>>>>> The most important changes are in the "main" GC code, i.e.
>>>>> G1ParCopyHelper::do_oop_work() and
>>>>> G1ParCopyHelper::copy_to_survivor_space(). Instead of having to
>>>>> propagate marks we only now need to mark objects directly
>>>>> reachable from roots during the initial-mark pause. The resulting
>>>>> code is much simplified (and hopefully more performant!).
>>>>>
>>>>> I also added a method verify_no_cset_oops() which checks that
>>>>> indeed all the marking data structures do not point to regions
>>>>> that are being GCed at the start / end of each GC. (BTW, I'm
>>>>> considering adding a develop flag to enable this on demand.)
>>>>>
>>>>> I should point out that this changeset will leave a lot of dead
>>>>> code. However, I took the decision to keep the changes to a
>>>>> minimum in order not overwhelm the code reviewers and make the
>>>>> important changes clearer. (I also discussed this with a couple of
>>>>> potential code reviewers and they agreed that this is a good
>>>>> approach.) I temporarily added guarantees to ensure that methods
>>>>> that should not be called are not called. I will remove all dead
>>>>> code with a future push.
>>>>>
>>>>> I also have to apologize to John Cuthbertson for removing a lot of
>>>>> code he's added to deal with various bugs we had in the GC/marking
>>>>> interaction. Hopefully the new code will be less fragile compared
>>>>> to what we've had so far and John will be able to concentrate on
>>>>> more interesting features than trying to track down
>>>>> hard-to-reproduce failures!
>>>>>
>>>>> Tony
>>>>>
>
More information about the hotspot-gc-dev
mailing list