RFR(L): 6484965: G1: piggy-back liveness accounting phase on marking

Wed Jan 18 19:19:36 UTC 2012

John,

On 2012-01-18 18:53, John Cuthbertson wrote:
> Hi Bengt,
>
> Not a problem. I'll take out the locking and combine the print 
> statements. If you don't mind I'll also change the prints in the 
> verification code to use HR_FORMAT and HR_FORMAT_PARAMS to make them a 
> bit more consistent with others.

Sounds great. Ship it!

Thanks,
Bengt

>
> Thanks again.
>
> JohnC
>
> On 1/18/2012 2:07 AM, Bengt Rutisson wrote:
>>
>> Hi John,
>>
>> I had a quick look through the changes. Looks good to me.
>>
>> I looked a little closer at the log output and locking in 
>> VerifyLiveObjectDataHRClosure::doHeapRegion(). I know I suggested to 
>> take the ParGCRareEvent_lock to avoid interleaved output from 
>> different threads, but I am having second thoughts about this.
>>
>> All your output is prefixed with "Region %d" and logged with 
>> print_cr(), so even if several threads do logging at the same time it 
>> should be easy enough to parse the log files. There is one exception 
>> to this and that is this block:
>>
>> 1612     if (failures > 0 && _verbose)  {
>> 1613       gclog_or_tty->print("Region %d: bottom: "PTR_FORMAT", 
>> ntams: "
>> 1614                           PTR_FORMAT", top: "PTR_FORMAT", end: 
>> "PTR_FORMAT,
>> 1615                           hr->hrs_index(), hr->bottom(), 
>> hr->next_top_at_mark_start(),
>> 1616                           hr->top(), hr->end());
>> 1617       gclog_or_tty->print_cr(", marked_bytes: calc/actual 
>> "SIZE_FORMAT"/"SIZE_FORMAT,
>> 1618                              _calc_cl.region_marked_bytes(),
>> 1619                              hr->next_marked_bytes());
>> 1620     }
>>
>> I guess this is what triggered my suggestion to use the lock in the 
>> last review. But now that I look at it I think it would be better to 
>> just merge the print() and print_cr() statements into one single 
>> print_cr() statement. If you do that I don't see the need for taking 
>> the lock.
>>
>> Sorry for going back and forth on this topic.
>>
>> The rest of the changes look good to me!
>> Bengt
>>
>> On 2012-01-12 09:26, John Cuthbertson wrote:
>>> Hi Everyone,
>>>
>>> The latest incarnation of these changes can be found at: 
>>> http://cr.openjdk.java.net/~johnc/6484965/webrev.3/
>>>
>>> The changes in this version include:
>>> * Conditionally using a lock so that the output of the verification 
>>> closure executed by different threads does not interfere with each 
>>> other (suggested by Bengt).
>>> * Merging up to the latest hotspot-gc tip (including Tony's marking 
>>> changes). This involved changing the evacuation failure code and 
>>> adding a suitable mark/count routine for use in 
>>> ConcurrentMark::grayRoot(). I also removed the counting changes from 
>>> code that has been made obsolete as a result of Tony's marking changes.
>>>
>>> Testing: a few runs of the GC test suite with low marking thresholds 
>>> (2 and 10%) with and without verification, and jprt.
>>>
>>> Thanks,
>>>
>>> JohnC
>>>
>>> On 12/21/2011 9:08 AM, John Cuthbertson wrote:
>>>> Hi Bengt,
>>>>
>>>> That's a good observation. I guess it is possible but I haven't 
>>>> seen it in practice (though I was typically only using 4 threads 
>>>> when debugging a verification failure). It won't do any harm so 
>>>> I'll add the locking.
>>>>
>>>> Thanks,
>>>>
>>>> JohnC
>>>>
>>>>
>>>>
>>>> On 12/21/2011 2:05 AM, Bengt Rutisson wrote:
>>>>>
>>>>> Hi John,
>>>>>
>>>>> Thanks for updating your fix! Looks good.
>>>>>
>>>>> One quesiton:
>>>>> In concurrentMark.cpp it seems to me that the 
>>>>> VerifyLiveObjectDataHRClosure could get the same kind of messed up 
>>>>> output that Tony just fixed with 7123165 for the VerifyLiveClosure 
>>>>> in heapRegion.cpp. There are several workers simultaneously doing 
>>>>> the verification, right? Is it worth adding the same kind of 
>>>>> locking that Tony added?
>>>>>
>>>>> Bengt
>>>>>
>>>>> On 2011-12-20 20:16, John Cuthbertson wrote:
>>>>>> Hi Bengt,
>>>>>>
>>>>>> As I mentioned earlier - thanks for the code review. I've applied 
>>>>>> your suggestions, merged with the the latest changeset in 
>>>>>> hsx/hotspot-gc/hotspot (resolving any conflicts), fixed the int 
>>>>>> <-> size_t issue you also mentioned, and retested using the GC 
>>>>>> test suite. A new webrev can be found at: 
>>>>>> http://cr.openjdk.java.net/~johnc/6484965/webrev.2/
>>>>>>
>>>>>> Specific replies are inline.
>>>>>>
>>>>>> On 11/28/11 05:07, Bengt Rutisson wrote:
>>>>>>>
>>>>>>> John,
>>>>>>>
>>>>>>> A little late, but here are some comments on this webrev. I know 
>>>>>>> you have some more improvements to this change coming, but 
>>>>>>> overall I think it looks good. Most of my comments are just 
>>>>>>> minor coding style comments.
>>>>>>>
>>>>>>> Bengt
>>>>>>>
>>>>>>> concurrentMark.hpp
>>>>>>>
>>>>>>> Rename ConcurrentMark::clear() to  ConcurrentMark::clear_mark() 
>>>>>>> or ConcurrentMark::unmark()? The commment you added is 
>>>>>>> definitely needed to understand what this method does. But it 
>>>>>>> would be even better if it was possible to get that from the 
>>>>>>> method name itself.
>>>>>>
>>>>>> Done.
>>>>>>
>>>>>>> It seems like everywhere we use count_marked_bytes_for(int 
>>>>>>> worker_i) we almost directly use the array returned to index 
>>>>>>> with the heap region that we are interested in. How about 
>>>>>>> wrapping all of this is in something like 
>>>>>>> count_set_marked_bytes_for(int worker_i, int hrs_index) and 
>>>>>>> count_get_marked_bytes_for(int worker_i, int hrs_index) ? That 
>>>>>>> way the data structure does not have to be exposed outside 
>>>>>>> ConcurrentMark. It would mean that 
>>>>>>> ConcurrentMark::count_region() would have to take a worker_i 
>>>>>>> value instead of a marked_bytes_array.
>>>>>>
>>>>>> I did not do this. I embed the marked_bytes array for a worker 
>>>>>> into the CMTask for that worker to save a de-reference. This was 
>>>>>> one of the requests from the original code walk-through. Avoiding 
>>>>>> the de-reference in the CMTask::do_marking_step() shaves a couple 
>>>>>> of points off the marking time. I think your suggestion would 
>>>>>> reinstate the de-reference again and we would lose those few 
>>>>>> percentage points again.
>>>>>>
>>>>>>> If you don't agree with the suggestion above I would suggest to 
>>>>>>> change the name from count_marked_bytes_for() to 
>>>>>>> count_marked_bytes_array_for() since in every place that it is 
>>>>>>> being called the resulting value is stored in a local variable 
>>>>>>> called marked_bytes_array, which seems like a more informative 
>>>>>>> name to me.
>>>>>>
>>>>>> Done. I agree - the new name sounds better.
>>>>>>
>>>>>>> I think this comment:
>>>>>>>
>>>>>>>   // As above - but we don't know the heap region containing the
>>>>>>>   // object and so have to supply it.
>>>>>>>   inline bool par_mark_and_count(oop obj, int worker_i);
>>>>>>>
>>>>>>> should be something like "we don't know the heap region 
>>>>>>> containing the object so we will have to look it up".
>>>>>>>
>>>>>>> Same thing here:
>>>>>>>
>>>>>>>   // As above - but we don't have the heap region containing the
>>>>>>>   // object, so we have to supply it.
>>>>>>>   // Should *not* be called from parallel code.
>>>>>>>   inline bool mark_and_count(oop obj);
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Comments were changed to:
>>>>>>
>>>>>>
>>>>>>> concurrentMark.cpp
>>>>>>>
>>>>>>> Since you are changing CalcLiveObjectsClosure::doHeapRegion() 
>>>>>>> anyway, could you please remove this unused code (1393-1397):
>>>>>>>
>>>>>>>         /*
>>>>>>>         gclog_or_tty->print_cr("Setting bits from %d/%d.",
>>>>>>>                                obj_card_num - _bottom_card_num,
>>>>>>>                                obj_last_card_num - 
>>>>>>> _bottom_card_num);
>>>>>>>         */
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Done.
>>>>>>
>>>>>>> What about the destructor ConcurrentMark::~ConcurrentMark() ? I 
>>>>>>> remember Tony mentioning that it won't be called. Do you still 
>>>>>>> want to keep the code?
>>>>>>
>>>>>> I removed the entire destructor - I don't see it being called in 
>>>>>> the experiments I've run.
>>>>>>
>>>>>>> FinalCountDataUpdateClosure::set_bit_for_region()
>>>>>>> Probably not worth it, but would it make sense to add 
>>>>>>> information in a startsHumongous HeapRegion to be able to give 
>>>>>>> you the last continuesHumongous region? Since we know this when 
>>>>>>> we set the regions up it seems like a waste to have to iterate 
>>>>>>> over the region list to find it.
>>>>>>
>>>>>> If you read the original comment - the original author did not 
>>>>>> want to make any assumptions about the internal field values of 
>>>>>> the HeapRegions spanned by a humongous object and so used the 
>>>>>> loop technique. I think you are correct and I now use the 
>>>>>> information in the startsHumongous region to find the index of 
>>>>>> the last continuesHumongous region spaned by the H-obj.
>>>>>>
>>>>>>> G1ParFinalCountTask
>>>>>>> To me it is a bit surprising that we mix in the verify code 
>>>>>>> inside this closure. Would it be possible to extract this code 
>>>>>>> out somehow?
>>>>>>
>>>>>> I did it this way to avoid another iteration over the heap 
>>>>>> regions. But it probably does make more sense to separate them 
>>>>>> and use another iteration to do the verify. Done.
>>>>>>
>>>>>>> Line 3378: "// Use fill_to_bytes". Is this something you plan on 
>>>>>>> doing?
>>>>>>
>>>>>> I removed the comment. I was thinking of doing this as 
>>>>>> fill_to_bytes is typically implemented using (a possibly 
>>>>>> specialized version of) memset. But it's probably not worth it in 
>>>>>> this case.
>>>>>>
>>>>>>> G1ParFinalCountTask::work()
>>>>>>> Just for the record. I don't really like the way we have to set 
>>>>>>> up both a VerifyLiveObjectDataHRClosure and a Mux2HRClosure even 
>>>>>>> though we will only use them if we have VerifyDuringGC enabled. 
>>>>>>> I realize it is due to the scoping, but I still think it 
>>>>>>> obstucts the code flow and introduces unnecessary work. 
>>>>>>> Unfortunately I don't have a good suggestion for how to work 
>>>>>>> around it.
>>>>>>>
>>>>>>> Since both VerifyLiveObjectDataHRClosure and a Mux2HRClosure are 
>>>>>>> StackObjs I assume it is not possible to get around the issue 
>>>>>>> with a ResourceMark.
>>>>>>
>>>>>> Now that the verification is performed in a separate iteration of 
>>>>>> the heap regions there's no need to create the 
>>>>>> VerifyLiveObjectDataHRClosure  and Mux2HRClosure instances here. 
>>>>>> Done. I have also removed the now-redundant Mux2HRClosure.
>>>>>>
>>>>>> Hopefully the new webrev addresses these comments.
>>>>>>
>>>>>> Thanks again for looking.
>>>>>>
>>>>>> JohnC
>>>>>>
>>>>>
>>>>
>>>
>>
>