RFR (M): 8069330: Adjustment of concurrent refinement thresholds does not take hot card cache into account

Wed Sep 30 19:42:27 UTC 2015

Hi Jon,

On Wed, 2015-09-30 at 11:50 -0700, Jon Masamitsu wrote:
> Thomas,
> 
> http://cr.openjdk.java.net/~tschatzl/8069330/webrev/src/share/vm/gc/g1/g1CollectorPolicy.cpp.frames.html
> 
> Should you be protecting against the difference at 1054 being < 0?   
> Perhaps generally that will
> not happen but as times get very small, the calculations could go bad.

The update RS time is the time it takes to execute
G1CollectedHeap::iterate_dirty_card_closure(). As ScanHCC is part of
that, ScanHCC cannot get larger than UpdateRS time.

Note that in this case both numbers used are actual averages from the
current GC, not estimates including previous GCs.

I do not know why. In any case, iirc, the cost_per_card metric is
typically quite stable.

I will think of some protection about really small times though.

> 1052 double cost_scan_hcc = 
> phase_times()->average_time_ms(G1GCPhaseTimes::ScanHCC);
> 1053     if (_pending_cards > 0) {
> 1054 cost_per_card_ms = 
> (phase_times()->average_time_ms(G1GCPhaseTimes::UpdateRS) - 
> cost_scan_hcc) / (double) _pending_cards;
> 1055       _cost_per_card_ms_seq->add(cost_per_card_ms);
> 1056     }
> 1057 _cost_scan_hcc_seq->add(cost_scan_hcc);
> 
> Also, not part of your change but if _pending_cards is 0, why isn't 0
> added to _cost_per_card_ms_seq?

"No work to do" does not add any information about how much time the
processing of a card took, so we do not add it.

The reason why we add times to _cost_scan_hcc_seq anyway is that
_pending_cards only counts cards (actually it only gives a very
conservative estimate. That might be problematic for small numbers where
that estimate has not a lot to do with the actual values) in the DCQ
buffers. The HCC is extra.

What do you think about moving the HCC time out of
G1CollectedHeap::iterate_dirty_card_closure(), to G1RemSet::updateRS to
make the difference more clear?

Now one could imagine that if there are few _pending_cards, the HCC
would be somewhat empty too. However, since we use and estimate on the
time the entire HCC takes, this does not matter. Unless there is a huge
variation in HCC use - which is unlikely because the HCC is often quite
full because by default it is very small. I.e. 1k entries. You cannot
really make it too large (like 1M = 2^20 entries), because in
applications that use the HCC (in practice, just generate a lot of
cards), this will just make G1 exceed the update RS pause time goal.

Iirc something like 2^14 entries takes ~10ms on our 32 thread machines
already... (which is really slow imo).

Which is btw another point we should fine tune a little, I think the
G1UpdatePauseTimePercent default of 10% is just too small for larger
applications. G1 behaves much better on those with a significantly
larger value.

Back to the question, we should probably do more filtering of obviously
incorrect values (like outliers) in all these kinds of calculations.

Not sure if above answers your question. Just ask me to clarify again if
needed.

> 1155 double recent_scan_hcc_time_ms = 
> phase_times()->average_time_ms(G1GCPhaseTimes::ScanHCC);
> 
> "recent" in the name recent_scan_hcc_time_ms made me want to go look for
> scan_hcc_time_ms (i.e., I wondered if there was another scan hcc time).  
> Change the name to scan_hcc_time_ms?

Okay. I will fix that.

> Can you point me at where G1GCPhaseTimes::UpdateRS is recorded?  How the 
> value in average_time_ms(G1GCPhaseTimes::UpdateRS)is updated.  I want to 
> understand how the ScanHCC time is included in the UpdateRS time.

In G1RemSet::updateRS(). This is also the only place that calls this
method.

Thanks for your comments,
  Thomas