RFR: 8231153: Improve concurrent refinement statistics

Thu Sep 26 01:50:52 UTC 2019

> On Sep 25, 2019, at 5:40 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi Kim,
> 
> On 24.09.19 04:58, Kim Barrett wrote:
>> […]CR:
>> https://bugs.openjdk.java.net/browse/JDK-8231153
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8231153/open.00/
>> Testing:
>> mach5 tier1-5
>> some local by-hand testing to look at the rate tracking.
> 
>  I think the change in functionality is good. I can't really judge the appropriateness of the input values to the predictors since I do not know their actual use and haven't gotten a feeling about whether there is a problem with that :)

Thanks for looking at it.

The current algorithm for controlling refinement thread activation has
problems because its inputs have little to do with what it should be
trying to achieve.  These new predictors provide information that I
currently think will be useful to making improvements there, but since
I haven’t fully worked out the details there might be further adjustments
coming along later.

I’m only responding to some of your comments here; I’ll be responding
to others later, along with a new webrev.  But in some cases I have some
questions needing answers before proceeding with some of the rework.

> My experience is that sometimes it is better to be slightly inaccurate in the values passed to reduce impact of outliers; in this case this may e.g. be more appropriate to use concurrently scanned cards instead of the actual concurrently refined cards as the difference are cards filtered out during refinement, and actual work is only done on *scanned* cards anyway. That may not be the case here, just an example - I can't tell.
> E.g. there may be a huge change in refined cards, but scanned cards do not change, and you are actually really interested in the latter.

I considered changing RemSet::refine_card_concurrently() to return a
bool value indicating whether the card was scanned or not. But I'm not
sure that's really worthwhile, since most of the the reasons for not
scanning are (I think) relatively rare.  (Stale cards and failure to
be filtered earlier for timing reasons.)  The exception is the HCC.
However, doing that would really just introduce another rate, e.g. the
expected fraction of enqueued cards that need scanning.

I have seen large variability in that fraction. It's been a while
since I was looking at that, but I think it had to do with the
difference between young-only and mixed GCs. It might be that separate
statistics need to be maintained for those cases. But I also think the
recent changes to the redirtying process may ameliorate that
difference. I'll be looking into that as part of future changes to
make use of this new data.

> - I would also suggest to add a "num_" prefix to numbers/counts of values.

I guess our tastes differ, because I strongly disliked those num_ prefixes.
They smack of Hungarian Notation and the like (of which I'm not a fan).

> - pre-existing: probably rename G1RemSet::_num_conc_scanned_cards and G1RemSetSummary::_conc_scanned_cards to "_concurrent_scanned_cards" to match the "_concurrent_refined_cards”.

RemSet::_num_conc_scanned_cards is only used for logging, and has a
poor implementation (racy increments). Given its usage and being in a
somewhat performance critical code path, I'm reluctant to fix the
implementation. I considered just removing it completely, since the
value isn't critical and could be misleading. What do you think?

> - I would remove the two "Bleh" in g1ConcurrentRefine.cpp - while I can see their point I would probably rather file an enhancement request :)

Oops, forgot to clean up those comments.

I don’t think there’s a useful RFR here.  The problem is fundamental to how
we do higher order functions in this code base.  I’ve tried being different in
some places and mostly got push-back to conform.

> - not sure, but I think exposing size() and start() and in G1FreeIdSet seems unnecessary: the only user is G1DirtyCardQueueSet anyway, and it is already owner of G1FreeIdSet. I.e. it knows these values already (and passes it to the initializer of the G1FreeIdSet instance, and already has a getter for the size() value), so getting it back from G1FreeIdSet seems a bit strange to me, but I am okay with current code.

The start() function is used in DCQS::mut_process_buffer().  However,
the start offset now appears to be a premature generalization.  In the
RFR for JDK-8216258 I said

  Also generalized to allow the id set to start with a non-zero id,
  because that looks like it might simplify other changes I'm working on. 

Nothing seems to have happened with that, and I no longer even recall what
changes I was thinking about there.  Maybe I should back out the new start()
and size() and eliminate the support for a start offset entirely.

I could do that as part of this change, or save it for another RFE.