RFR(L): 6484965: G1: piggy-back liveness accounting phase on marking

Wed Dec 21 17:08:14 UTC 2011

Hi Bengt,

That's a good observation. I guess it is possible but I haven't seen it 
in practice (though I was typically only using 4 threads when debugging 
a verification failure). It won't do any harm so I'll add the locking.

Thanks,

JohnC

On 12/21/2011 2:05 AM, Bengt Rutisson wrote:
>
> Hi John,
>
> Thanks for updating your fix! Looks good.
>
> One quesiton:
> In concurrentMark.cpp it seems to me that the 
> VerifyLiveObjectDataHRClosure could get the same kind of messed up 
> output that Tony just fixed with 7123165 for the VerifyLiveClosure in 
> heapRegion.cpp. There are several workers simultaneously doing the 
> verification, right? Is it worth adding the same kind of locking that 
> Tony added?
>
> Bengt
>
> On 2011-12-20 20:16, John Cuthbertson wrote:
>> Hi Bengt,
>>
>> As I mentioned earlier - thanks for the code review. I've applied 
>> your suggestions, merged with the the latest changeset in 
>> hsx/hotspot-gc/hotspot (resolving any conflicts), fixed the int <-> 
>> size_t issue you also mentioned, and retested using the GC test 
>> suite. A new webrev can be found at: 
>> http://cr.openjdk.java.net/~johnc/6484965/webrev.2/
>>
>> Specific replies are inline.
>>
>> On 11/28/11 05:07, Bengt Rutisson wrote:
>>>
>>> John,
>>>
>>> A little late, but here are some comments on this webrev. I know you 
>>> have some more improvements to this change coming, but overall I 
>>> think it looks good. Most of my comments are just minor coding style 
>>> comments.
>>>
>>> Bengt
>>>
>>> concurrentMark.hpp
>>>
>>> Rename ConcurrentMark::clear() to  ConcurrentMark::clear_mark() or 
>>> ConcurrentMark::unmark()? The commment you added is definitely 
>>> needed to understand what this method does. But it would be even 
>>> better if it was possible to get that from the method name itself.
>>
>> Done.
>>
>>> It seems like everywhere we use count_marked_bytes_for(int worker_i) 
>>> we almost directly use the array returned to index with the heap 
>>> region that we are interested in. How about wrapping all of this is 
>>> in something like count_set_marked_bytes_for(int worker_i, int 
>>> hrs_index) and count_get_marked_bytes_for(int worker_i, int 
>>> hrs_index) ? That way the data structure does not have to be exposed 
>>> outside ConcurrentMark. It would mean that 
>>> ConcurrentMark::count_region() would have to take a worker_i value 
>>> instead of a marked_bytes_array.
>>
>> I did not do this. I embed the marked_bytes array for a worker into 
>> the CMTask for that worker to save a de-reference. This was one of 
>> the requests from the original code walk-through. Avoiding the 
>> de-reference in the CMTask::do_marking_step() shaves a couple of 
>> points off the marking time. I think your suggestion would reinstate 
>> the de-reference again and we would lose those few percentage points 
>> again.
>>
>>> If you don't agree with the suggestion above I would suggest to 
>>> change the name from count_marked_bytes_for() to 
>>> count_marked_bytes_array_for() since in every place that it is being 
>>> called the resulting value is stored in a local variable called 
>>> marked_bytes_array, which seems like a more informative name to me.
>>
>> Done. I agree - the new name sounds better.
>>
>>> I think this comment:
>>>
>>>   // As above - but we don't know the heap region containing the
>>>   // object and so have to supply it.
>>>   inline bool par_mark_and_count(oop obj, int worker_i);
>>>
>>> should be something like "we don't know the heap region containing 
>>> the object so we will have to look it up".
>>>
>>> Same thing here:
>>>
>>>   // As above - but we don't have the heap region containing the
>>>   // object, so we have to supply it.
>>>   // Should *not* be called from parallel code.
>>>   inline bool mark_and_count(oop obj);
>>>
>>>
>>
>> Comments were changed to:
>>
>>
>>> concurrentMark.cpp
>>>
>>> Since you are changing CalcLiveObjectsClosure::doHeapRegion() 
>>> anyway, could you please remove this unused code (1393-1397):
>>>
>>>         /*
>>>         gclog_or_tty->print_cr("Setting bits from %d/%d.",
>>>                                obj_card_num - _bottom_card_num,
>>>                                obj_last_card_num - _bottom_card_num);
>>>         */
>>>
>>>
>>
>> Done.
>>
>>> What about the destructor ConcurrentMark::~ConcurrentMark() ? I 
>>> remember Tony mentioning that it won't be called. Do you still want 
>>> to keep the code?
>>
>> I removed the entire destructor - I don't see it being called in the 
>> experiments I've run.
>>
>>> FinalCountDataUpdateClosure::set_bit_for_region()
>>> Probably not worth it, but would it make sense to add information in 
>>> a startsHumongous HeapRegion to be able to give you the last 
>>> continuesHumongous region? Since we know this when we set the 
>>> regions up it seems like a waste to have to iterate over the region 
>>> list to find it.
>>
>> If you read the original comment - the original author did not want 
>> to make any assumptions about the internal field values of the 
>> HeapRegions spanned by a humongous object and so used the loop 
>> technique. I think you are correct and I now use the information in 
>> the startsHumongous region to find the index of the last 
>> continuesHumongous region spaned by the H-obj.
>>
>>> G1ParFinalCountTask
>>> To me it is a bit surprising that we mix in the verify code inside 
>>> this closure. Would it be possible to extract this code out somehow?
>>
>> I did it this way to avoid another iteration over the heap regions. 
>> But it probably does make more sense to separate them and use another 
>> iteration to do the verify. Done.
>>
>>> Line 3378: "// Use fill_to_bytes". Is this something you plan on doing?
>>
>> I removed the comment. I was thinking of doing this as fill_to_bytes 
>> is typically implemented using (a possibly specialized version of) 
>> memset. But it's probably not worth it in this case.
>>
>>> G1ParFinalCountTask::work()
>>> Just for the record. I don't really like the way we have to set up 
>>> both a VerifyLiveObjectDataHRClosure and a Mux2HRClosure even though 
>>> we will only use them if we have VerifyDuringGC enabled. I realize 
>>> it is due to the scoping, but I still think it obstucts the code 
>>> flow and introduces unnecessary work. Unfortunately I don't have a 
>>> good suggestion for how to work around it.
>>>
>>> Since both VerifyLiveObjectDataHRClosure and a Mux2HRClosure are 
>>> StackObjs I assume it is not possible to get around the issue with a 
>>> ResourceMark.
>>
>> Now that the verification is performed in a separate iteration of the 
>> heap regions there's no need to create the 
>> VerifyLiveObjectDataHRClosure  and Mux2HRClosure instances here. 
>> Done. I have also removed the now-redundant Mux2HRClosure.
>>
>> Hopefully the new webrev addresses these comments.
>>
>> Thanks again for looking.
>>
>> JohnC
>>
>