CRR (L / updated): 6888336: G1: avoid explicitly marking and pushing objects in survivor spaces

Tue Jan 10 21:41:01 UTC 2012

Hi Tony,

I think this looks good. Ship it!

A couple of coding style questions:

In g1CollectorPolicy.cpp, lines 908-920 there is this if statement:

   if (!during_initial_mark_pause()) {
     ...
   } else {
     ...
   }

I realize that "!during_initial_mark_pause()" is the more common case, 
but to me it seems more natural to avoid having a negation in the test, 
so I'd swap the if and else blocks. What's your opinion on this?

In  g1OopClosures.hpp you swapped the lines 151 and 152, which makes it 
look like this:

  149   G1ParCopyClosure(G1CollectedHeap* g1, G1ParScanThreadState* 
par_scan_state,
  150                    ReferenceProcessor* rp) :
  151       G1ParCopyHelper(g1, par_scan_state, &_scanner),
  152       _scanner(g1, par_scan_state, rp) {

I guess you want the call to the super class constructor before other 
initialization. To me it looks strange that the _scanner is passed to 
the super class, but is now actually not initialized until after the 
call to the super constructor. I would have preferred the order as it 
was before. I guess the new order works fine as long as the super 
constructor never tries to use the _scanner.

Anyway, just details. All in all it looks great!

Bengt

On 2012-01-07 05:43, Tony Printezis wrote:
> Hi again,
>
> Here's an updated webrev after I merged my changes with John's latest 
> push:
>
> http://cr.openjdk.java.net/~tonyp/6888336/webrev.3/
>
> The code is basically the same, I just had to move some of it to 
> different places.
>
> Testing update: I've been testing the changes continuously on three 
> machines over the holidays and I haven't seen any failures since the 
> single failure over Xmas which was caused by the race in the array 
> chunking changes which has now been resolved. I did additional testing 
> with a patch from John (thanks again!) which artificially forces 
> evacuation failures to stress that code and, again, I saw no issues 
> with that either.
>
> Tony
>
> On 01/05/2012 03:17 PM, Tony Printezis wrote:
>> Hi all,
>>
>> Updated webrev after making some changes based on comments from John 
>> (thanks John!):
>>
>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.2/
>>
>> I'd like to clarify something: this change relies on the array 
>> chunking changes (7121623) but the webrev does not include those 
>> changes (despite what the index page says). So, if you want to try 
>> this patch out you'll need to apply the array chunking changes first.
>>
>> Tony
>>
>> On 12/27/2011 01:05 PM, Tony Printezis wrote:
>>> Hi all,
>>>
>>> Here's an updated webrev for this change that takes into account the 
>>> new approach of chunking object arrays (see previous e-mails on 
>>> 7121623):
>>>
>>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.1/
>>>
>>> If anything else the new approach simplified the code a bit since 
>>> now we can always read an object's size from its from-image instead 
>>> of having to check one or the other depending on whether it's a 
>>> chunked array or not. I also moved the body of some methods from 
>>> heapRegion.hpp to the .inline.hpp and .cpp files (as they were 
>>> getting a bit large to keep in the .hpp file).
>>>
>>> Tony
>>>
>>> On 12/21/2011 05:37 PM, Tony Printezis wrote:
>>>> Hi all,
>>>>
>>>> I'd like a couple of code reviews for the following non-trivial 
>>>> changes (large, not necessary in lines of code modified but more 
>>>> due to the fact that the evacuation pause / concurrent marking 
>>>> interaction is changed quite dramatically):
>>>>
>>>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.0/
>>>>
>>>> Here's some background, motivation, and a summary of the changes (I 
>>>> felt that it was important to write a longer then usual explanation):
>>>>
>>>> * Background / Motivation
>>>>
>>>> Each G1 heap region has a field top-at-mark-start (aka TAMS) which 
>>>> denotes where the top of the region was when marking started. An 
>>>> object is considered implicitly live if it's over TAMS (i.e., it 
>>>> was allocated since marking started) or explicitly live if it's 
>>>> below TAMS (i.e., it was allocated before marking started) and 
>>>> marked on the bitmap. (It follows that it's unnecessary to 
>>>> explicitly mark objects over TAMS.)
>>>>
>>>> In fact, we have two copies of the above marking information: "Next 
>>>> TAMS / Next Bitmap" and "Prev TAMS / Prev Bitmap". Prev is the copy 
>>>> that was obtained by the last marking cycle that was successfully 
>>>> completed (so, it is consistent: all live objects should appear as 
>>>> live in the prev marking information). Next is the copy that will 
>>>> be obtained / is currently being obtained and it's not consistent 
>>>> because it's not guaranteed to be complete.
>>>>
>>>> G1 uses SATB marking which has the advantage not to require objects 
>>>> allocated since the start of marking to be visited at all by the 
>>>> marking threads (they are implicitly live and they do not need to 
>>>> be scanned). So, the active marking cycle can totally ignore 
>>>> objects over NTAMS (since they have been allocated since marking 
>>>> started).
>>>>
>>>> The current interaction between evacuation pauses (let's call these 
>>>> "GCs" from now on) and concurrent marking is very tricky. Even 
>>>> though marking ignores all objects over NTAMS (currently: all 
>>>> objects in Eden regions) it still has to visit and mark objects in 
>>>> the Survivors regions. But those will be moved by subsequent GCs. 
>>>> So, a GC needs to be aware that it's moving objects that have been 
>>>> marked by the marking threads and not only propagate those marks 
>>>> but also notify the marking threads that said objects have been 
>>>> moved. For that we use several data structures: pushes to the 
>>>> global marking stack and also to what's referred to as the "region 
>>>> stack" which is only used by the GC to push a group of objects 
>>>> instead of pushing them individually  ("region" here is a mem 
>>>> region and smaller than a G1 region).
>>>>
>>>> Additionally, because the marking threads could come across objects 
>>>> that could potentially move we have to make sure that we don't 
>>>> leave references to regions that have been evacuated on any marking 
>>>> data structure. To do that we treat as roots all entries on the 
>>>> taskqueues / global stack and drained all SATB buffers (both active 
>>>> buffers and also enqueued buffers).
>>>>
>>>> The first issue with the above interaction is that it has 
>>>> performance issues. Draining all SATB buffers and scanning the mark 
>>>> stack and taskqueues has been shown to be very time-consuming in 
>>>> some cases. Also, having to check whether objects are marked and 
>>>> propagate the marks appropriately during GC is an extra overhead.
>>>>
>>>> The second issue is that it has been shown to be very fragile. We 
>>>> have discovered and fixed many issues over time which were subtle 
>>>> and hard to reproduce.
>>>>
>>>> We really need to simplify the GC/marking interaction to both 
>>>> improve performance of GCs during marking, as well as improve our 
>>>> reliability. This changeset does exactly that.
>>>>
>>>> * Explanation of the changes
>>>>
>>>> The goal is to ensure that all the objects that are copied by the 
>>>> GC do not need to be visited by the marking threads and as a result 
>>>> do not need to be explicitly marked, pushed, etc.
>>>>
>>>> The first observation is that most objects copied during a GC are 
>>>> allocated after marking starts and are therefore implicitly live. 
>>>> This is the case for all objects on Eden regions, as well as most 
>>>> objects on Survivor regions. The only exception are objects on the 
>>>> Survivor regions during the initial-mark pause. Unfortunately, it's 
>>>> not easy to track those separately as they will get mixed in with 
>>>> future Survivors. The first decision to deal with this is to turn 
>>>> off Survivors during the initial-mark pause. This ensures that all 
>>>> objects copied during each subsequent GC will only visit objects 
>>>> that have been allocated since marking started and are therefore 
>>>> implicitly live (i.e., over NTAMS). This allows us to totally 
>>>> eliminate that code that propagates marks during the GC. We just 
>>>> have to make sure that all copied objects are over NTAMS. Turning 
>>>> off Survivors during an initial-mark pause is a bit of a "big 
>>>> hammer" approach, but it will suffice for now. We have ideas on how 
>>>> to re-enable them in the future and we'll explore a couple of 
>>>> alternatives.
>>>>
>>>> Given that the GC only copies objects that are implicitly marked it 
>>>> follows that none of the objects that are copied during any GC 
>>>> should appear on either the taskqueues nor the global marking 
>>>> stack. Also remember that we filter SATB buffers before enqueueing 
>>>> them which will filter out all implicitly marked objects. It 
>>>> follows that no enqueued SATB buffer should have references to 
>>>> objects that are being moved. This leaves the currently active SATB 
>>>> buffers given that the code that populates them is unconditional. 
>>>> But if we run the filtering on those during each GC such 
>>>> "offending" references are also quickly eliminated. So, instead of 
>>>> having to scan all stacks and all SATB buffers we only have to 
>>>> filter the active SATB buffers, which should be much, much faster.
>>>>
>>>> * Implementation Notes
>>>>
>>>> The actual changes are not too extensive as they basically mostly 
>>>> disable functionality in the GC code. The tricky part was to get 
>>>> the TAMS fields correct at various phases (start of copying, start 
>>>> of marking, etc.) and especially when an evacuation failure occurs. 
>>>> I put all that functionality in methods on HeapRegion which do the 
>>>> right thing when a GC starts, a marking starts, etc.
>>>>
>>>> The most important changes are in the "main" GC code, i.e. 
>>>> G1ParCopyHelper::do_oop_work() and 
>>>> G1ParCopyHelper::copy_to_survivor_space(). Instead of having to 
>>>> propagate marks we only now need to mark objects directly reachable 
>>>> from roots during the initial-mark pause. The resulting code is 
>>>> much simplified (and hopefully more performant!).
>>>>
>>>> I also added a method verify_no_cset_oops() which checks that 
>>>> indeed all the marking data structures do not point to regions that 
>>>> are being GCed at the start / end of each GC. (BTW, I'm considering 
>>>> adding a develop flag to enable this on demand.)
>>>>
>>>> I should point out that this changeset will leave a lot of dead 
>>>> code. However, I took the decision to keep the changes to a minimum 
>>>> in order not overwhelm the code reviewers and make the important 
>>>> changes clearer. (I also discussed this with a couple of potential 
>>>> code reviewers and they agreed that this is a good approach.) I 
>>>> temporarily added guarantees to ensure that methods that should not 
>>>> be called are not called. I will remove all dead code with a future 
>>>> push.
>>>>
>>>> I also have to apologize to John Cuthbertson for removing a lot of 
>>>> code he's added to deal with various bugs we had in the GC/marking 
>>>> interaction. Hopefully the new code will be less fragile compared 
>>>> to what we've had so far and John will be able to concentrate on 
>>>> more interesting features than trying to track down 
>>>> hard-to-reproduce failures!
>>>>
>>>> Tony
>>>>