RFR(L): 6484965: G1: piggy-back liveness accounting phase on marking

Thu Jan 12 08:26:25 UTC 2012

Hi Everyone,

The latest incarnation of these changes can be found at: 
http://cr.openjdk.java.net/~johnc/6484965/webrev.3/

The changes in this version include:
* Conditionally using a lock so that the output of the verification 
closure executed by different threads does not interfere with each other 
(suggested by Bengt).
* Merging up to the latest hotspot-gc tip (including Tony's marking 
changes). This involved changing the evacuation failure code and adding 
a suitable mark/count routine for use in ConcurrentMark::grayRoot(). I 
also removed the counting changes from code that has been made obsolete 
as a result of Tony's marking changes.

Testing: a few runs of the GC test suite with low marking thresholds (2 
and 10%) with and without verification, and jprt.

Thanks,

JohnC

On 12/21/2011 9:08 AM, John Cuthbertson wrote:
> Hi Bengt,
>
> That's a good observation. I guess it is possible but I haven't seen 
> it in practice (though I was typically only using 4 threads when 
> debugging a verification failure). It won't do any harm so I'll add 
> the locking.
>
> Thanks,
>
> JohnC
>
>
>
> On 12/21/2011 2:05 AM, Bengt Rutisson wrote:
>>
>> Hi John,
>>
>> Thanks for updating your fix! Looks good.
>>
>> One quesiton:
>> In concurrentMark.cpp it seems to me that the 
>> VerifyLiveObjectDataHRClosure could get the same kind of messed up 
>> output that Tony just fixed with 7123165 for the VerifyLiveClosure in 
>> heapRegion.cpp. There are several workers simultaneously doing the 
>> verification, right? Is it worth adding the same kind of locking that 
>> Tony added?
>>
>> Bengt
>>
>> On 2011-12-20 20:16, John Cuthbertson wrote:
>>> Hi Bengt,
>>>
>>> As I mentioned earlier - thanks for the code review. I've applied 
>>> your suggestions, merged with the the latest changeset in 
>>> hsx/hotspot-gc/hotspot (resolving any conflicts), fixed the int <-> 
>>> size_t issue you also mentioned, and retested using the GC test 
>>> suite. A new webrev can be found at: 
>>> http://cr.openjdk.java.net/~johnc/6484965/webrev.2/
>>>
>>> Specific replies are inline.
>>>
>>> On 11/28/11 05:07, Bengt Rutisson wrote:
>>>>
>>>> John,
>>>>
>>>> A little late, but here are some comments on this webrev. I know 
>>>> you have some more improvements to this change coming, but overall 
>>>> I think it looks good. Most of my comments are just minor coding 
>>>> style comments.
>>>>
>>>> Bengt
>>>>
>>>> concurrentMark.hpp
>>>>
>>>> Rename ConcurrentMark::clear() to  ConcurrentMark::clear_mark() or 
>>>> ConcurrentMark::unmark()? The commment you added is definitely 
>>>> needed to understand what this method does. But it would be even 
>>>> better if it was possible to get that from the method name itself.
>>>
>>> Done.
>>>
>>>> It seems like everywhere we use count_marked_bytes_for(int 
>>>> worker_i) we almost directly use the array returned to index with 
>>>> the heap region that we are interested in. How about wrapping all 
>>>> of this is in something like count_set_marked_bytes_for(int 
>>>> worker_i, int hrs_index) and count_get_marked_bytes_for(int 
>>>> worker_i, int hrs_index) ? That way the data structure does not 
>>>> have to be exposed outside ConcurrentMark. It would mean that 
>>>> ConcurrentMark::count_region() would have to take a worker_i value 
>>>> instead of a marked_bytes_array.
>>>
>>> I did not do this. I embed the marked_bytes array for a worker into 
>>> the CMTask for that worker to save a de-reference. This was one of 
>>> the requests from the original code walk-through. Avoiding the 
>>> de-reference in the CMTask::do_marking_step() shaves a couple of 
>>> points off the marking time. I think your suggestion would reinstate 
>>> the de-reference again and we would lose those few percentage points 
>>> again.
>>>
>>>> If you don't agree with the suggestion above I would suggest to 
>>>> change the name from count_marked_bytes_for() to 
>>>> count_marked_bytes_array_for() since in every place that it is 
>>>> being called the resulting value is stored in a local variable 
>>>> called marked_bytes_array, which seems like a more informative name 
>>>> to me.
>>>
>>> Done. I agree - the new name sounds better.
>>>
>>>> I think this comment:
>>>>
>>>>   // As above - but we don't know the heap region containing the
>>>>   // object and so have to supply it.
>>>>   inline bool par_mark_and_count(oop obj, int worker_i);
>>>>
>>>> should be something like "we don't know the heap region containing 
>>>> the object so we will have to look it up".
>>>>
>>>> Same thing here:
>>>>
>>>>   // As above - but we don't have the heap region containing the
>>>>   // object, so we have to supply it.
>>>>   // Should *not* be called from parallel code.
>>>>   inline bool mark_and_count(oop obj);
>>>>
>>>>
>>>
>>> Comments were changed to:
>>>
>>>
>>>> concurrentMark.cpp
>>>>
>>>> Since you are changing CalcLiveObjectsClosure::doHeapRegion() 
>>>> anyway, could you please remove this unused code (1393-1397):
>>>>
>>>>         /*
>>>>         gclog_or_tty->print_cr("Setting bits from %d/%d.",
>>>>                                obj_card_num - _bottom_card_num,
>>>>                                obj_last_card_num - _bottom_card_num);
>>>>         */
>>>>
>>>>
>>>
>>> Done.
>>>
>>>> What about the destructor ConcurrentMark::~ConcurrentMark() ? I 
>>>> remember Tony mentioning that it won't be called. Do you still want 
>>>> to keep the code?
>>>
>>> I removed the entire destructor - I don't see it being called in the 
>>> experiments I've run.
>>>
>>>> FinalCountDataUpdateClosure::set_bit_for_region()
>>>> Probably not worth it, but would it make sense to add information 
>>>> in a startsHumongous HeapRegion to be able to give you the last 
>>>> continuesHumongous region? Since we know this when we set the 
>>>> regions up it seems like a waste to have to iterate over the region 
>>>> list to find it.
>>>
>>> If you read the original comment - the original author did not want 
>>> to make any assumptions about the internal field values of the 
>>> HeapRegions spanned by a humongous object and so used the loop 
>>> technique. I think you are correct and I now use the information in 
>>> the startsHumongous region to find the index of the last 
>>> continuesHumongous region spaned by the H-obj.
>>>
>>>> G1ParFinalCountTask
>>>> To me it is a bit surprising that we mix in the verify code inside 
>>>> this closure. Would it be possible to extract this code out somehow?
>>>
>>> I did it this way to avoid another iteration over the heap regions. 
>>> But it probably does make more sense to separate them and use 
>>> another iteration to do the verify. Done.
>>>
>>>> Line 3378: "// Use fill_to_bytes". Is this something you plan on 
>>>> doing?
>>>
>>> I removed the comment. I was thinking of doing this as fill_to_bytes 
>>> is typically implemented using (a possibly specialized version of) 
>>> memset. But it's probably not worth it in this case.
>>>
>>>> G1ParFinalCountTask::work()
>>>> Just for the record. I don't really like the way we have to set up 
>>>> both a VerifyLiveObjectDataHRClosure and a Mux2HRClosure even 
>>>> though we will only use them if we have VerifyDuringGC enabled. I 
>>>> realize it is due to the scoping, but I still think it obstucts the 
>>>> code flow and introduces unnecessary work. Unfortunately I don't 
>>>> have a good suggestion for how to work around it.
>>>>
>>>> Since both VerifyLiveObjectDataHRClosure and a Mux2HRClosure are 
>>>> StackObjs I assume it is not possible to get around the issue with 
>>>> a ResourceMark.
>>>
>>> Now that the verification is performed in a separate iteration of 
>>> the heap regions there's no need to create the 
>>> VerifyLiveObjectDataHRClosure  and Mux2HRClosure instances here. 
>>> Done. I have also removed the now-redundant Mux2HRClosure.
>>>
>>> Hopefully the new webrev addresses these comments.
>>>
>>> Thanks again for looking.
>>>
>>> JohnC
>>>
>>
>