RFR(S): 7185699: G1: Prediction model discrepancies

Wed Jul 25 09:14:20 UTC 2012

Hi,

On Tue, 2012-07-24 at 11:15 -0700, John Cuthbertson wrote:
> Hi Everyone,
> 
> Can I have a couple of volunteers to review the changes for this CR? The 
> webrev can be found at: http://cr.openjdk.java.net/~johnc/7185699/webrev.0/.
> 

 here are some notes, although I am not an official reviewer...

> Background:
> While I was going through the code for the current mixed GC policy and 
> the code that adds non-young regions to the collection set, I found a 
> couple of minor bugs associated with the prediction code. The first was 
> the calculation of the number of unprocessed dirty card at the start of 
> the GC - this was using routines that return a number of bytes rather 
> than the number of entries.

G1CollectedHeap::max_pending_card_num() just below of the fixed
pending_card_num() seems to have the same issue.

I also think G1CollectorPolicy::predict_pending_cards() and
G1CollectorPolicy::predict_pending_card_diff() can be removed, they do
not seem to be referenced anywhere in the code.

> As a result we were vastly overestimating
> the base pause time. The second was in the code that calculates the GC 

I think the effects mostly cancelled themselves out long term due to the
way the prediction is calculated: the time prediction is based on a
prediction of pending_cards times a cost prediction per card.
Pending_cards is too high by the oopSize as per your patch, but the cost
prediction per card has been derived from a value based on the rset
update time divided by the number of pending cards, which was too high
by that same factor.

(I haven't measured anything)

> efficiency of a non-young region. The GC efficiency of a region is 
> calculated at the end of marking. The code, however, was using the wrong 
> routine to predict the number of cards for a given RSet length. It was 
> using the data collected for young GCs when it should have been using 
> the data collected for mixed GCs.

I have one question about a probable use of predictors of different
types of collection. In particular, during mixed collections, the
predict_region_elapsed_time_ms() method is also called to determine
young generation regions evacuation time predictions, with the "young"
parameter set to true always.

Looking at the code (copied below for convenience), during that time the
number of cards prediction for young gen regions is based on the
prediction of mixed collection card lengths (as expected), but at the
same time, the "other" time prediction is based on the time prediction
for the young collections.

This seems odd, because I'd expect both prediction components to be
based on the same prior decision at all times.

Maybe you could improve these predictions for the young gen regions by
always base them on the young collection predictors. This assumes that
young gen region timing is the same independent on the type of
collection - but still the same for both card length and "other" time.

double
G1CollectorPolicy::predict_region_elapsed_time_ms(HeapRegion* hr,
                                                  bool young) {
  size_t rs_length = hr->rem_set()->occupied();
  size_t card_num;
  if (gcs_are_young()) {
    card_num = predict_young_card_num(rs_length);
  } else {
    card_num = predict_non_young_card_num(rs_length);
  }
  size_t bytes_to_copy = predict_bytes_to_copy(hr);

  double region_elapsed_time_ms =
    predict_rs_scan_time_ms(card_num) +
    predict_object_copy_time_ms(bytes_to_copy);

  if (young)
    region_elapsed_time_ms += predict_young_other_time_ms(1);
  else
    region_elapsed_time_ms += predict_non_young_other_time_ms(1);

  return region_elapsed_time_ms;
}

Wouldn't it be better to base the selection of predictions exclusively
on the "young" parameter, and eventually pass gcs_are_young() as value
for the "young" parameter appropriately?

This would also automatically remove the need for the inlined code in
HeapRegion::calc_gc_efficiency, which seems just a source for errors. I
doubt anybody reads the "synchronize me" comments in heapRegion.cpp when
changing the prediction code in g1CollectorPolicy.cpp. Additionally this
calculation seems to be some policy  that would seem perfect for putting
the method into G1CollectorPolicy.
At least then the "synchronize me" comment could be in the same file, or
even factored out into a single method.

Thomas