CRR (L / updated): 6888336: G1: avoid explicitly marking and pushing objects in survivor spaces

Sat Jan 7 04:43:34 UTC 2012

Hi again,

Here's an updated webrev after I merged my changes with John's latest push:

http://cr.openjdk.java.net/~tonyp/6888336/webrev.3/

The code is basically the same, I just had to move some of it to 
different places.

Testing update: I've been testing the changes continuously on three 
machines over the holidays and I haven't seen any failures since the 
single failure over Xmas which was caused by the race in the array 
chunking changes which has now been resolved. I did additional testing 
with a patch from John (thanks again!) which artificially forces 
evacuation failures to stress that code and, again, I saw no issues with 
that either.

Tony

On 01/05/2012 03:17 PM, Tony Printezis wrote:
> Hi all,
>
> Updated webrev after making some changes based on comments from John 
> (thanks John!):
>
> http://cr.openjdk.java.net/~tonyp/6888336/webrev.2/
>
> I'd like to clarify something: this change relies on the array 
> chunking changes (7121623) but the webrev does not include those 
> changes (despite what the index page says). So, if you want to try 
> this patch out you'll need to apply the array chunking changes first.
>
> Tony
>
> On 12/27/2011 01:05 PM, Tony Printezis wrote:
>> Hi all,
>>
>> Here's an updated webrev for this change that takes into account the 
>> new approach of chunking object arrays (see previous e-mails on 
>> 7121623):
>>
>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.1/
>>
>> If anything else the new approach simplified the code a bit since now 
>> we can always read an object's size from its from-image instead of 
>> having to check one or the other depending on whether it's a chunked 
>> array or not. I also moved the body of some methods from 
>> heapRegion.hpp to the .inline.hpp and .cpp files (as they were 
>> getting a bit large to keep in the .hpp file).
>>
>> Tony
>>
>> On 12/21/2011 05:37 PM, Tony Printezis wrote:
>>> Hi all,
>>>
>>> I'd like a couple of code reviews for the following non-trivial 
>>> changes (large, not necessary in lines of code modified but more due 
>>> to the fact that the evacuation pause / concurrent marking 
>>> interaction is changed quite dramatically):
>>>
>>> http://cr.openjdk.java.net/~tonyp/6888336/webrev.0/
>>>
>>> Here's some background, motivation, and a summary of the changes (I 
>>> felt that it was important to write a longer then usual explanation):
>>>
>>> * Background / Motivation
>>>
>>> Each G1 heap region has a field top-at-mark-start (aka TAMS) which 
>>> denotes where the top of the region was when marking started. An 
>>> object is considered implicitly live if it's over TAMS (i.e., it was 
>>> allocated since marking started) or explicitly live if it's below 
>>> TAMS (i.e., it was allocated before marking started) and marked on 
>>> the bitmap. (It follows that it's unnecessary to explicitly mark 
>>> objects over TAMS.)
>>>
>>> In fact, we have two copies of the above marking information: "Next 
>>> TAMS / Next Bitmap" and "Prev TAMS / Prev Bitmap". Prev is the copy 
>>> that was obtained by the last marking cycle that was successfully 
>>> completed (so, it is consistent: all live objects should appear as 
>>> live in the prev marking information). Next is the copy that will be 
>>> obtained / is currently being obtained and it's not consistent 
>>> because it's not guaranteed to be complete.
>>>
>>> G1 uses SATB marking which has the advantage not to require objects 
>>> allocated since the start of marking to be visited at all by the 
>>> marking threads (they are implicitly live and they do not need to be 
>>> scanned). So, the active marking cycle can totally ignore objects 
>>> over NTAMS (since they have been allocated since marking started).
>>>
>>> The current interaction between evacuation pauses (let's call these 
>>> "GCs" from now on) and concurrent marking is very tricky. Even 
>>> though marking ignores all objects over NTAMS (currently: all 
>>> objects in Eden regions) it still has to visit and mark objects in 
>>> the Survivors regions. But those will be moved by subsequent GCs. 
>>> So, a GC needs to be aware that it's moving objects that have been 
>>> marked by the marking threads and not only propagate those marks but 
>>> also notify the marking threads that said objects have been moved. 
>>> For that we use several data structures: pushes to the global 
>>> marking stack and also to what's referred to as the "region stack" 
>>> which is only used by the GC to push a group of objects instead of 
>>> pushing them individually  ("region" here is a mem region and 
>>> smaller than a G1 region).
>>>
>>> Additionally, because the marking threads could come across objects 
>>> that could potentially move we have to make sure that we don't leave 
>>> references to regions that have been evacuated on any marking data 
>>> structure. To do that we treat as roots all entries on the 
>>> taskqueues / global stack and drained all SATB buffers (both active 
>>> buffers and also enqueued buffers).
>>>
>>> The first issue with the above interaction is that it has 
>>> performance issues. Draining all SATB buffers and scanning the mark 
>>> stack and taskqueues has been shown to be very time-consuming in 
>>> some cases. Also, having to check whether objects are marked and 
>>> propagate the marks appropriately during GC is an extra overhead.
>>>
>>> The second issue is that it has been shown to be very fragile. We 
>>> have discovered and fixed many issues over time which were subtle 
>>> and hard to reproduce.
>>>
>>> We really need to simplify the GC/marking interaction to both 
>>> improve performance of GCs during marking, as well as improve our 
>>> reliability. This changeset does exactly that.
>>>
>>> * Explanation of the changes
>>>
>>> The goal is to ensure that all the objects that are copied by the GC 
>>> do not need to be visited by the marking threads and as a result do 
>>> not need to be explicitly marked, pushed, etc.
>>>
>>> The first observation is that most objects copied during a GC are 
>>> allocated after marking starts and are therefore implicitly live. 
>>> This is the case for all objects on Eden regions, as well as most 
>>> objects on Survivor regions. The only exception are objects on the 
>>> Survivor regions during the initial-mark pause. Unfortunately, it's 
>>> not easy to track those separately as they will get mixed in with 
>>> future Survivors. The first decision to deal with this is to turn 
>>> off Survivors during the initial-mark pause. This ensures that all 
>>> objects copied during each subsequent GC will only visit objects 
>>> that have been allocated since marking started and are therefore 
>>> implicitly live (i.e., over NTAMS). This allows us to totally 
>>> eliminate that code that propagates marks during the GC. We just 
>>> have to make sure that all copied objects are over NTAMS. Turning 
>>> off Survivors during an initial-mark pause is a bit of a "big 
>>> hammer" approach, but it will suffice for now. We have ideas on how 
>>> to re-enable them in the future and we'll explore a couple of 
>>> alternatives.
>>>
>>> Given that the GC only copies objects that are implicitly marked it 
>>> follows that none of the objects that are copied during any GC 
>>> should appear on either the taskqueues nor the global marking stack. 
>>> Also remember that we filter SATB buffers before enqueueing them 
>>> which will filter out all implicitly marked objects. It follows that 
>>> no enqueued SATB buffer should have references to objects that are 
>>> being moved. This leaves the currently active SATB buffers given 
>>> that the code that populates them is unconditional. But if we run 
>>> the filtering on those during each GC such "offending" references 
>>> are also quickly eliminated. So, instead of having to scan all 
>>> stacks and all SATB buffers we only have to filter the active SATB 
>>> buffers, which should be much, much faster.
>>>
>>> * Implementation Notes
>>>
>>> The actual changes are not too extensive as they basically mostly 
>>> disable functionality in the GC code. The tricky part was to get the 
>>> TAMS fields correct at various phases (start of copying, start of 
>>> marking, etc.) and especially when an evacuation failure occurs. I 
>>> put all that functionality in methods on HeapRegion which do the 
>>> right thing when a GC starts, a marking starts, etc.
>>>
>>> The most important changes are in the "main" GC code, i.e. 
>>> G1ParCopyHelper::do_oop_work() and 
>>> G1ParCopyHelper::copy_to_survivor_space(). Instead of having to 
>>> propagate marks we only now need to mark objects directly reachable 
>>> from roots during the initial-mark pause. The resulting code is much 
>>> simplified (and hopefully more performant!).
>>>
>>> I also added a method verify_no_cset_oops() which checks that indeed 
>>> all the marking data structures do not point to regions that are 
>>> being GCed at the start / end of each GC. (BTW, I'm considering 
>>> adding a develop flag to enable this on demand.)
>>>
>>> I should point out that this changeset will leave a lot of dead 
>>> code. However, I took the decision to keep the changes to a minimum 
>>> in order not overwhelm the code reviewers and make the important 
>>> changes clearer. (I also discussed this with a couple of potential 
>>> code reviewers and they agreed that this is a good approach.) I 
>>> temporarily added guarantees to ensure that methods that should not 
>>> be called are not called. I will remove all dead code with a future 
>>> push.
>>>
>>> I also have to apologize to John Cuthbertson for removing a lot of 
>>> code he's added to deal with various bugs we had in the GC/marking 
>>> interaction. Hopefully the new code will be less fragile compared to 
>>> what we've had so far and John will be able to concentrate on more 
>>> interesting features than trying to track down hard-to-reproduce 
>>> failures!
>>>
>>> Tony
>>>