RFR(L): 6484965: G1: piggy-back liveness accounting phase on marking

Tue Jan 24 18:53:18 UTC 2012

Hi Everyone,

I have a new webrev for these changes that can be found at: 
http://cr.openjdk.java.net/~johnc/6484965/webrev.5/

This version includes changes based upon code review comments by Tony, 
including:
* a helper routine to calculate the index in the card bitmap(s) for a 
given address
* a clean up of the code that is used to set/clear bits in the card 
bitmap(s): for small ranges I use a simple loop which the compilers seem 
to be doing a reasonable job optimizing and for larger ranges I use 
[par_]set_range() with suitable parameters to set the bit in the range 
inclusively. The single-bit case is handle by the small range code; nor 
have I seen the OOB assertion failure during testing.
* Moved the clearing/initialization of the liveness counting data 
structures from the initial mark pause to the ConcurrentMark constructor 
and wherever the Next marking bitmap is cleared (at the end of the pause 
or when the marking is aborted as result of a full GC). At the end of 
the marking cycle, this clearing is concurrent.
* Changing the type of _queue_num in G1ParScanThreadState to a uint.
* Various cleanups, typos, and formatting changes.

I found and fixed a small bug in the aggregation code that was exposed 
by a change based upon one of the comments by Tony.

Testing: GC test suite with marking verification on and low marking 
thresholds (2% and 10%). At Tony's request, I also ran with some 
prototype marking bitmap verification code.

Thanks,

JohnC

On 01/12/12 00:26, John Cuthbertson wrote:
> Hi Everyone,
>
> The latest incarnation of these changes can be found at: 
> http://cr.openjdk.java.net/~johnc/6484965/webrev.3/
>
> The changes in this version include:
> * Conditionally using a lock so that the output of the verification 
> closure executed by different threads does not interfere with each 
> other (suggested by Bengt).
> * Merging up to the latest hotspot-gc tip (including Tony's marking 
> changes). This involved changing the evacuation failure code and 
> adding a suitable mark/count routine for use in 
> ConcurrentMark::grayRoot(). I also removed the counting changes from 
> code that has been made obsolete as a result of Tony's marking changes.
>
> Testing: a few runs of the GC test suite with low marking thresholds 
> (2 and 10%) with and without verification, and jprt.
>
> Thanks,
>
> JohnC
>
> On 12/21/2011 9:08 AM, John Cuthbertson wrote:
>> Hi Bengt,
>>
>> That's a good observation. I guess it is possible but I haven't seen 
>> it in practice (though I was typically only using 4 threads when 
>> debugging a verification failure). It won't do any harm so I'll add 
>> the locking.
>>
>> Thanks,
>>
>> JohnC
>>
>>
>>
>> On 12/21/2011 2:05 AM, Bengt Rutisson wrote:
>>>
>>> Hi John,
>>>
>>> Thanks for updating your fix! Looks good.
>>>
>>> One quesiton:
>>> In concurrentMark.cpp it seems to me that the 
>>> VerifyLiveObjectDataHRClosure could get the same kind of messed up 
>>> output that Tony just fixed with 7123165 for the VerifyLiveClosure 
>>> in heapRegion.cpp. There are several workers simultaneously doing 
>>> the verification, right? Is it worth adding the same kind of locking 
>>> that Tony added?
>>>
>>> Bengt
>>>
>>> On 2011-12-20 20:16, John Cuthbertson wrote:
>>>> Hi Bengt,
>>>>
>>>> As I mentioned earlier - thanks for the code review. I've applied 
>>>> your suggestions, merged with the the latest changeset in 
>>>> hsx/hotspot-gc/hotspot (resolving any conflicts), fixed the int <-> 
>>>> size_t issue you also mentioned, and retested using the GC test 
>>>> suite. A new webrev can be found at: 
>>>> http://cr.openjdk.java.net/~johnc/6484965/webrev.2/
>>>>
>>>> Specific replies are inline.
>>>>
>>>> On 11/28/11 05:07, Bengt Rutisson wrote:
>>>>>
>>>>> John,
>>>>>
>>>>> A little late, but here are some comments on this webrev. I know 
>>>>> you have some more improvements to this change coming, but overall 
>>>>> I think it looks good. Most of my comments are just minor coding 
>>>>> style comments.
>>>>>
>>>>> Bengt
>>>>>
>>>>> concurrentMark.hpp
>>>>>
>>>>> Rename ConcurrentMark::clear() to  ConcurrentMark::clear_mark() or 
>>>>> ConcurrentMark::unmark()? The commment you added is definitely 
>>>>> needed to understand what this method does. But it would be even 
>>>>> better if it was possible to get that from the method name itself.
>>>>
>>>> Done.
>>>>
>>>>> It seems like everywhere we use count_marked_bytes_for(int 
>>>>> worker_i) we almost directly use the array returned to index with 
>>>>> the heap region that we are interested in. How about wrapping all 
>>>>> of this is in something like count_set_marked_bytes_for(int 
>>>>> worker_i, int hrs_index) and count_get_marked_bytes_for(int 
>>>>> worker_i, int hrs_index) ? That way the data structure does not 
>>>>> have to be exposed outside ConcurrentMark. It would mean that 
>>>>> ConcurrentMark::count_region() would have to take a worker_i value 
>>>>> instead of a marked_bytes_array.
>>>>
>>>> I did not do this. I embed the marked_bytes array for a worker into 
>>>> the CMTask for that worker to save a de-reference. This was one of 
>>>> the requests from the original code walk-through. Avoiding the 
>>>> de-reference in the CMTask::do_marking_step() shaves a couple of 
>>>> points off the marking time. I think your suggestion would 
>>>> reinstate the de-reference again and we would lose those few 
>>>> percentage points again.
>>>>
>>>>> If you don't agree with the suggestion above I would suggest to 
>>>>> change the name from count_marked_bytes_for() to 
>>>>> count_marked_bytes_array_for() since in every place that it is 
>>>>> being called the resulting value is stored in a local variable 
>>>>> called marked_bytes_array, which seems like a more informative 
>>>>> name to me.
>>>>
>>>> Done. I agree - the new name sounds better.
>>>>
>>>>> I think this comment:
>>>>>
>>>>>   // As above - but we don't know the heap region containing the
>>>>>   // object and so have to supply it.
>>>>>   inline bool par_mark_and_count(oop obj, int worker_i);
>>>>>
>>>>> should be something like "we don't know the heap region containing 
>>>>> the object so we will have to look it up".
>>>>>
>>>>> Same thing here:
>>>>>
>>>>>   // As above - but we don't have the heap region containing the
>>>>>   // object, so we have to supply it.
>>>>>   // Should *not* be called from parallel code.
>>>>>   inline bool mark_and_count(oop obj);
>>>>>
>>>>>
>>>>
>>>> Comments were changed to:
>>>>
>>>>
>>>>> concurrentMark.cpp
>>>>>
>>>>> Since you are changing CalcLiveObjectsClosure::doHeapRegion() 
>>>>> anyway, could you please remove this unused code (1393-1397):
>>>>>
>>>>>         /*
>>>>>         gclog_or_tty->print_cr("Setting bits from %d/%d.",
>>>>>                                obj_card_num - _bottom_card_num,
>>>>>                                obj_last_card_num - _bottom_card_num);
>>>>>         */
>>>>>
>>>>>
>>>>
>>>> Done.
>>>>
>>>>> What about the destructor ConcurrentMark::~ConcurrentMark() ? I 
>>>>> remember Tony mentioning that it won't be called. Do you still 
>>>>> want to keep the code?
>>>>
>>>> I removed the entire destructor - I don't see it being called in 
>>>> the experiments I've run.
>>>>
>>>>> FinalCountDataUpdateClosure::set_bit_for_region()
>>>>> Probably not worth it, but would it make sense to add information 
>>>>> in a startsHumongous HeapRegion to be able to give you the last 
>>>>> continuesHumongous region? Since we know this when we set the 
>>>>> regions up it seems like a waste to have to iterate over the 
>>>>> region list to find it.
>>>>
>>>> If you read the original comment - the original author did not want 
>>>> to make any assumptions about the internal field values of the 
>>>> HeapRegions spanned by a humongous object and so used the loop 
>>>> technique. I think you are correct and I now use the information in 
>>>> the startsHumongous region to find the index of the last 
>>>> continuesHumongous region spaned by the H-obj.
>>>>
>>>>> G1ParFinalCountTask
>>>>> To me it is a bit surprising that we mix in the verify code inside 
>>>>> this closure. Would it be possible to extract this code out somehow?
>>>>
>>>> I did it this way to avoid another iteration over the heap regions. 
>>>> But it probably does make more sense to separate them and use 
>>>> another iteration to do the verify. Done.
>>>>
>>>>> Line 3378: "// Use fill_to_bytes". Is this something you plan on 
>>>>> doing?
>>>>
>>>> I removed the comment. I was thinking of doing this as 
>>>> fill_to_bytes is typically implemented using (a possibly 
>>>> specialized version of) memset. But it's probably not worth it in 
>>>> this case.
>>>>
>>>>> G1ParFinalCountTask::work()
>>>>> Just for the record. I don't really like the way we have to set up 
>>>>> both a VerifyLiveObjectDataHRClosure and a Mux2HRClosure even 
>>>>> though we will only use them if we have VerifyDuringGC enabled. I 
>>>>> realize it is due to the scoping, but I still think it obstucts 
>>>>> the code flow and introduces unnecessary work. Unfortunately I 
>>>>> don't have a good suggestion for how to work around it.
>>>>>
>>>>> Since both VerifyLiveObjectDataHRClosure and a Mux2HRClosure are 
>>>>> StackObjs I assume it is not possible to get around the issue with 
>>>>> a ResourceMark.
>>>>
>>>> Now that the verification is performed in a separate iteration of 
>>>> the heap regions there's no need to create the 
>>>> VerifyLiveObjectDataHRClosure  and Mux2HRClosure instances here. 
>>>> Done. I have also removed the now-redundant Mux2HRClosure.
>>>>
>>>> Hopefully the new webrev addresses these comments.
>>>>
>>>> Thanks again for looking.
>>>>
>>>> JohnC
>>>>
>>>
>>
>