CRR (L / updated): 6888336: G1: avoid explicitly marking and pushing objects in survivor spaces

Tue Jan 10 23:57:21 UTC 2012

Bengt,

Hi, thanks for looking at it! See inline.

On 01/10/2012 04:41 PM, Bengt Rutisson wrote:
>
> Hi Tony,
>
> I think this looks good. Ship it!
>
> A couple of coding style questions:
>
> In g1CollectorPolicy.cpp, lines 908-920 there is this if statement:
>
>   if (!during_initial_mark_pause()) {
>     ...
>   } else {
>     ...
>   }
>
> I realize that "!during_initial_mark_pause()" is the more common case, 
> but to me it seems more natural to avoid having a negation in the 
> test, so I'd swap the if and else blocks. What's your opinion on this?
>

As you pointed out I tend to put the common case in the if-block even if 
I have to negate the condition. This is a nice unambiguous way to 
structure the if-statement (well, assuming we can unambiguously decide 
what the "common" case is!). I'd like to leave this as is. If it makes 
you feel any better: I already have the changes to re-enable survivors 
so that test will disappear soon-ish. :-)

> In  g1OopClosures.hpp you swapped the lines 151 and 152, which makes 
> it look like this:
>
>  149   G1ParCopyClosure(G1CollectedHeap* g1, G1ParScanThreadState* 
> par_scan_state,
>  150                    ReferenceProcessor* rp) :
>  151       G1ParCopyHelper(g1, par_scan_state, &_scanner),
>  152       _scanner(g1, par_scan_state, rp) {
>
> I guess you want the call to the super class constructor before other 
> initialization. To me it looks strange that the _scanner is passed to 
> the super class, but is now actually not initialized until after the 
> call to the super constructor. I would have preferred the order as it 
> was before. I guess the new order works fine as long as the super 
> constructor never tries to use the _scanner.

You're right. I wanted the constructor to be called first but I was 
clearly a bit careless and I missed the dependency. Thanks for looking 
at this carefully! Even though the dependency is benign here (the ref to 
_scanner is only stored locally in the super class) I think we should 
avoid and future surprises. So I'll undo the change.

I'll push this asap.

Tony

> Anyway, just details. All in all it looks great!
>
> Bengt
>
>
> On 2012-01-07 05:43, Tony Printezis wrote:
>> Hi again,
>>
>> Here's an updated webrev after I merged my changes with John's latest 
>> push:
>>
>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.3/
>>
>> The code is basically the same, I just had to move some of it to 
>> different places.
>>
>> Testing update: I've been testing the changes continuously on three 
>> machines over the holidays and I haven't seen any failures since the 
>> single failure over Xmas which was caused by the race in the array 
>> chunking changes which has now been resolved. I did additional 
>> testing with a patch from John (thanks again!) which artificially 
>> forces evacuation failures to stress that code and, again, I saw no 
>> issues with that either.
>>
>> Tony
>>
>> On 01/05/2012 03:17 PM, Tony Printezis wrote:
>>> Hi all,
>>>
>>> Updated webrev after making some changes based on comments from John 
>>> (thanks John!):
>>>
>>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.2/
>>>
>>> I'd like to clarify something: this change relies on the array 
>>> chunking changes (7121623) but the webrev does not include those 
>>> changes (despite what the index page says). So, if you want to try 
>>> this patch out you'll need to apply the array chunking changes first.
>>>
>>> Tony
>>>
>>> On 12/27/2011 01:05 PM, Tony Printezis wrote:
>>>> Hi all,
>>>>
>>>> Here's an updated webrev for this change that takes into account 
>>>> the new approach of chunking object arrays (see previous e-mails on 
>>>> 7121623):
>>>>
>>>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.1/
>>>>
>>>> If anything else the new approach simplified the code a bit since 
>>>> now we can always read an object's size from its from-image instead 
>>>> of having to check one or the other depending on whether it's a 
>>>> chunked array or not. I also moved the body of some methods from 
>>>> heapRegion.hpp to the .inline.hpp and .cpp files (as they were 
>>>> getting a bit large to keep in the .hpp file).
>>>>
>>>> Tony
>>>>
>>>> On 12/21/2011 05:37 PM, Tony Printezis wrote:
>>>>> Hi all,
>>>>>
>>>>> I'd like a couple of code reviews for the following non-trivial 
>>>>> changes (large, not necessary in lines of code modified but more 
>>>>> due to the fact that the evacuation pause / concurrent marking 
>>>>> interaction is changed quite dramatically):
>>>>>
>>>>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.0/
>>>>>
>>>>> Here's some background, motivation, and a summary of the changes 
>>>>> (I felt that it was important to write a longer then usual 
>>>>> explanation):
>>>>>
>>>>> * Background / Motivation
>>>>>
>>>>> Each G1 heap region has a field top-at-mark-start (aka TAMS) which 
>>>>> denotes where the top of the region was when marking started. An 
>>>>> object is considered implicitly live if it's over TAMS (i.e., it 
>>>>> was allocated since marking started) or explicitly live if it's 
>>>>> below TAMS (i.e., it was allocated before marking started) and 
>>>>> marked on the bitmap. (It follows that it's unnecessary to 
>>>>> explicitly mark objects over TAMS.)
>>>>>
>>>>> In fact, we have two copies of the above marking information: 
>>>>> "Next TAMS / Next Bitmap" and "Prev TAMS / Prev Bitmap". Prev is 
>>>>> the copy that was obtained by the last marking cycle that was 
>>>>> successfully completed (so, it is consistent: all live objects 
>>>>> should appear as live in the prev marking information). Next is 
>>>>> the copy that will be obtained / is currently being obtained and 
>>>>> it's not consistent because it's not guaranteed to be complete.
>>>>>
>>>>> G1 uses SATB marking which has the advantage not to require 
>>>>> objects allocated since the start of marking to be visited at all 
>>>>> by the marking threads (they are implicitly live and they do not 
>>>>> need to be scanned). So, the active marking cycle can totally 
>>>>> ignore objects over NTAMS (since they have been allocated since 
>>>>> marking started).
>>>>>
>>>>> The current interaction between evacuation pauses (let's call 
>>>>> these "GCs" from now on) and concurrent marking is very tricky. 
>>>>> Even though marking ignores all objects over NTAMS (currently: all 
>>>>> objects in Eden regions) it still has to visit and mark objects in 
>>>>> the Survivors regions. But those will be moved by subsequent GCs. 
>>>>> So, a GC needs to be aware that it's moving objects that have been 
>>>>> marked by the marking threads and not only propagate those marks 
>>>>> but also notify the marking threads that said objects have been 
>>>>> moved. For that we use several data structures: pushes to the 
>>>>> global marking stack and also to what's referred to as the "region 
>>>>> stack" which is only used by the GC to push a group of objects 
>>>>> instead of pushing them individually  ("region" here is a mem 
>>>>> region and smaller than a G1 region).
>>>>>
>>>>> Additionally, because the marking threads could come across 
>>>>> objects that could potentially move we have to make sure that we 
>>>>> don't leave references to regions that have been evacuated on any 
>>>>> marking data structure. To do that we treat as roots all entries 
>>>>> on the taskqueues / global stack and drained all SATB buffers 
>>>>> (both active buffers and also enqueued buffers).
>>>>>
>>>>> The first issue with the above interaction is that it has 
>>>>> performance issues. Draining all SATB buffers and scanning the 
>>>>> mark stack and taskqueues has been shown to be very time-consuming 
>>>>> in some cases. Also, having to check whether objects are marked 
>>>>> and propagate the marks appropriately during GC is an extra overhead.
>>>>>
>>>>> The second issue is that it has been shown to be very fragile. We 
>>>>> have discovered and fixed many issues over time which were subtle 
>>>>> and hard to reproduce.
>>>>>
>>>>> We really need to simplify the GC/marking interaction to both 
>>>>> improve performance of GCs during marking, as well as improve our 
>>>>> reliability. This changeset does exactly that.
>>>>>
>>>>> * Explanation of the changes
>>>>>
>>>>> The goal is to ensure that all the objects that are copied by the 
>>>>> GC do not need to be visited by the marking threads and as a 
>>>>> result do not need to be explicitly marked, pushed, etc.
>>>>>
>>>>> The first observation is that most objects copied during a GC are 
>>>>> allocated after marking starts and are therefore implicitly live. 
>>>>> This is the case for all objects on Eden regions, as well as most 
>>>>> objects on Survivor regions. The only exception are objects on the 
>>>>> Survivor regions during the initial-mark pause. Unfortunately, 
>>>>> it's not easy to track those separately as they will get mixed in 
>>>>> with future Survivors. The first decision to deal with this is to 
>>>>> turn off Survivors during the initial-mark pause. This ensures 
>>>>> that all objects copied during each subsequent GC will only visit 
>>>>> objects that have been allocated since marking started and are 
>>>>> therefore implicitly live (i.e., over NTAMS). This allows us to 
>>>>> totally eliminate that code that propagates marks during the GC. 
>>>>> We just have to make sure that all copied objects are over NTAMS. 
>>>>> Turning off Survivors during an initial-mark pause is a bit of a 
>>>>> "big hammer" approach, but it will suffice for now. We have ideas 
>>>>> on how to re-enable them in the future and we'll explore a couple 
>>>>> of alternatives.
>>>>>
>>>>> Given that the GC only copies objects that are implicitly marked 
>>>>> it follows that none of the objects that are copied during any GC 
>>>>> should appear on either the taskqueues nor the global marking 
>>>>> stack. Also remember that we filter SATB buffers before enqueueing 
>>>>> them which will filter out all implicitly marked objects. It 
>>>>> follows that no enqueued SATB buffer should have references to 
>>>>> objects that are being moved. This leaves the currently active 
>>>>> SATB buffers given that the code that populates them is 
>>>>> unconditional. But if we run the filtering on those during each GC 
>>>>> such "offending" references are also quickly eliminated. So, 
>>>>> instead of having to scan all stacks and all SATB buffers we only 
>>>>> have to filter the active SATB buffers, which should be much, much 
>>>>> faster.
>>>>>
>>>>> * Implementation Notes
>>>>>
>>>>> The actual changes are not too extensive as they basically mostly 
>>>>> disable functionality in the GC code. The tricky part was to get 
>>>>> the TAMS fields correct at various phases (start of copying, start 
>>>>> of marking, etc.) and especially when an evacuation failure 
>>>>> occurs. I put all that functionality in methods on HeapRegion 
>>>>> which do the right thing when a GC starts, a marking starts, etc.
>>>>>
>>>>> The most important changes are in the "main" GC code, i.e. 
>>>>> G1ParCopyHelper::do_oop_work() and 
>>>>> G1ParCopyHelper::copy_to_survivor_space(). Instead of having to 
>>>>> propagate marks we only now need to mark objects directly 
>>>>> reachable from roots during the initial-mark pause. The resulting 
>>>>> code is much simplified (and hopefully more performant!).
>>>>>
>>>>> I also added a method verify_no_cset_oops() which checks that 
>>>>> indeed all the marking data structures do not point to regions 
>>>>> that are being GCed at the start / end of each GC. (BTW, I'm 
>>>>> considering adding a develop flag to enable this on demand.)
>>>>>
>>>>> I should point out that this changeset will leave a lot of dead 
>>>>> code. However, I took the decision to keep the changes to a 
>>>>> minimum in order not overwhelm the code reviewers and make the 
>>>>> important changes clearer. (I also discussed this with a couple of 
>>>>> potential code reviewers and they agreed that this is a good 
>>>>> approach.) I temporarily added guarantees to ensure that methods 
>>>>> that should not be called are not called. I will remove all dead 
>>>>> code with a future push.
>>>>>
>>>>> I also have to apologize to John Cuthbertson for removing a lot of 
>>>>> code he's added to deal with various bugs we had in the GC/marking 
>>>>> interaction. Hopefully the new code will be less fragile compared 
>>>>> to what we've had so far and John will be able to concentrate on 
>>>>> more interesting features than trying to track down 
>>>>> hard-to-reproduce failures!
>>>>>
>>>>> Tony
>>>>>
>