RFR (M): 8019342: G1: High "Other" time most likely due to card redirtying

Mon Apr 14 13:27:58 UTC 2014

Hi Thomas,

Looks good.

A couple of minor things:

You made apply_closure_to_all_completed_buffers() and 
par_apply_closure_to_all_completed_buffers() take the closure they 
should apply as a parameter instead of looking it up in the _closure 
instance variable. I like this a lot better, but why did you keep the 
_closure instance variable for other methods 
(iterate_closure_all_threads() and mut_process_buffer()). It would have 
been nice to have the same pattern for all methods. If it is difficult 
to change the others to take a closure as parameter I think I would 
prefer to revert the apply methods to also use the instance variable.

Not strictly related to your change, but it would make it easier to 
understand you change:
In G1CollectedHeap::check_ct_logs_at_safepoint() we start out by doing:
   DirtyCardQueueSet& dcqs = JavaThread::dirty_card_queue_set();
But in that method we still use JavaThread::dirty_card_queue_set() 
explicitly twice. It would be nice to replace these calls with dcqs to 
make it clear that it is the same instance.

These two formatting changes are unrelated to your other changes:

dirtyCardQueue.cpp:  165 BufferNode* 
DirtyCardQueueSet::get_completed_buffer(int stop_at) {
g1CollectedHeap.cpp:  173  YoungList::YoungList(G1CollectedHeap* g1h) :

I would prefer to leave those out.

Thanks,
Bengt

On 2014-04-11 13:33, Thomas Schatzl wrote:
> Hi all,
>
>    just adding the CRs for the proposed follow-up changes:
>
> On Fri, 2014-04-11 at 12:53 +0200, Thomas Schatzl wrote:
>> Hi all,
>>
>>    can I have a few reviews for the following change that parallelizes
>> card redirtying and improves log output?
>>
>> On a few applications card redirtying time is very large, taking tens of
>> ms. Investigation showed that there are no really clever ways to avoid
>> this work when keeping the current way of handling not-redirtied cards
>> except parallelizing it.
>>
>> I.e. the ideas referred to in the CR
>> (https://bugs.openjdk.java.net/browse/JDK-8019342?focusedCommentId=13346944&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13346944) do not yield any good results to make it worth implementing these suggestions.
>>
>> This CR implements parallelization by putting this work into an
>> AbstractGangTask and do the work in parallel; for that I added a
>> par_apply_closure_to_all_completed_buffers() method in
>> DirtyCardQueueSet, claiming work on a per chunk basis. This yields
>> (almost) linear speedup.
>>
>> Logging code has been improved to show the number of cards processed per
>> thread too.
>>
>> Note that there is still some code duplication with iterating over the
>> DCQS during verification: that will be fixed in a followup CR.
> https://bugs.openjdk.java.net/browse/JDK-8040002
>
>> Also, at some point I will merge the various "Other" gangtasks into a
>> single one to avoid startup/shutdown costs in another CR.
> https://bugs.openjdk.java.net/browse/JDK-8040006
>
> Thanks,
>    Thomas
>