RFR: 8231153: Improve concurrent refinement statistics

Wed Sep 25 09:40:21 UTC 2019

Hi Kim,

On 24.09.19 04:58, Kim Barrett wrote:
> Please review this change to collection additional statistics about
> the generation and processing of dirty cards.  These are needed for
> future improvements to the control of the concurrent refinement
> threads (JDK-8137022).
> 
> This change adds tracking and prediction of the rate at which
> concurrent refinement threads process cards.  It also adds tracking
> and prediction of the rate at which mutator threads generate cards.
> 
> As part of that, there is now tracking of total cards processed by the
> concurrent refinement threads and the mutator threads.  This is now
> used by G1RemsetSummary instead of the buffer counts previously reported.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8231153
> 
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8231153/open.00/
> 
> Testing:
> mach5 tier1-5
> some local by-hand testing to look at the rate tracking.
> 

   I think the change in functionality is good. I can't really judge the 
appropriateness of the input values to the predictors since I do not 
know their actual use and haven't gotten a feeling about whether there 
is a problem with that :)

My experience is that sometimes it is better to be slightly inaccurate 
in the values passed to reduce impact of outliers; in this case this may 
e.g. be more appropriate to use concurrently scanned cards instead of 
the actual concurrently refined cards as the difference are cards 
filtered out during refinement, and actual work is only done on 
*scanned* cards anyway. That may not be the case here, just an example - 
I can't tell.
E.g. there may be a huge change in refined cards, but scanned cards do 
not change, and you are actually really interested in the latter.

When reading the change I had the following thoughts to improve readability:

- maybe some comment somewhere what "scanned" really means compared to 
"refined". Initially I was surprised with the change at 
G1RemSet::_num_conc_scanned_cards, but some thinking made me aware of 
the difference.

- the change reuses the "processed" term for counted cards in a few 
places, and it is unclear to me what the difference to just "refined" 
cards would be in some cases.

I.e. I think the additional term for apparently the same thing seems 
unnecessary, and I would recommend changing it to just "refined".

E.g. G1DirtyCardQueueSet::refine_buffer(, size_t* cards_processed) -> 
...(, size_t* num_cards_refined).

There is already a distinction between concurrently and mutator 
processed cards by adding the appropriate term to the names, so 
"processed" seems superfluous.

Also seen in G1ConcurrentRefineThread::run_service().

- I would also suggest to add a "num_" prefix to numbers/counts of values.

E.g. in the above method declaration it is unclear whether 
"cards_processed" is a pointer to an array of cards or simply the 
number/count of cards given only the name.	

Other occurrences:

G1ConcurrentRefineThread::refined_cards -> num_refined_cards
G1DirtyCardQueueSet::mutator_refined_cards -> num_mutator_refined_cards

I saw that the change actually removes a few of these prefixes in 
G1RemSetSummary, which I am not really happy about; this extra 
redundancy does help with readability imho, particularly when you pass 
them around (also when you start dealing with both at the same time :))

- in G1Policy::_pending_cards should be renamed to 
"_pending_cards_at_start_of_gc" since we also now have a 
"_pending_cards_after_last_gc" to distinguish their use a little better?

- pre-existing: probably rename G1RemSet::_num_conc_scanned_cards and 
G1RemSetSummary::_conc_scanned_cards to "_concurrent_scanned_cards" to 
match the "_concurrent_refined_cards".

- I would remove the two "Bleh" in g1ConcurrentRefine.cpp - while I can 
see their point I would probably rather file an enhancement request :)

- not sure, but I think exposing size() and start() and in G1FreeIdSet 
seems unnecessary: the only user is G1DirtyCardQueueSet anyway, and it 
is already owner of G1FreeIdSet. I.e. it knows these values already (and 
passes it to the initializer of the G1FreeIdSet instance, and already 
has a getter for the size() value), so getting it back from G1FreeIdSet 
seems a bit strange to me, but I am okay with current code.

Thanks,
   Thomas