RFR (M): 8137082: Factor out G1 prediction code from G1CollectorPolicy and clean up

Kim Barrett kim.barrett at oracle.com
Tue Oct 13 22:57:19 UTC 2015


On Oct 1, 2015, at 10:22 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>> It seems strange to have a confidence value that decreases with more
>> samples.  One would think that means that it is intended to be used as
>> a divisor rather than a multiplier.  But then I see the following:
> 
> It's not a confidence value that is typically expressed in percent, but
> a factor as the name implies.
> Its use explains why it is decaying with the amount of samples
> available.
> 
>> 
>>  52   double get_new_prediction(TruncatedSeq const* seq) const{
>>  53     return MAX2(seq->davg() + _sigma * seq->dsd(),
>>  54                 seq->davg() * confidence_factor(seq->num()));
>>  55   }
>> 
>> and I'm confused.  So having few values increases the prediction?
>> That seems like an odd policy.
> 
> It makes at least some sense to me. Let's break down the values of the
> two terms and their use:
> […]

Thanks for the explanation, and for restructuring and renaming the
confidence_factor. That made it clearer what's going on.

On Oct 13, 2015, at 8:26 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~tschatzl/8137082/webrev.2/ (full)
> http://cr.openjdk.java.net/~tschatzl/8137082/webrev.0_to_2/ (diff) 

------------------------------------------------------------------------------

I'd prefer that significant numbers of formatting changes were done
separately from other kinds of changes. For example, I'd rather have
the signature reformattings in src/share/vm/gc/g1/survRateGroup.cpp
done as a separate change, to make that easier to review and to avoid
cluttering functional / refactoring changes with unrelated stuff.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1Predictions.hpp 
  42   double stddev_estimate_for_small_sample_sets(TruncatedSeq const* seq) const {
  43     int const samples = seq->num();
  44     if (samples > 4) {
  45       return 0.0;
  46     } else {
  47       return seq->davg() * (5 - samples) / 2.0;
  48     }
  49   }

  56   double get_new_prediction(TruncatedSeq const* seq) const {
  57     return MAX2(seq->davg() + _sigma * seq->dsd(),
  58                 seq->davg() + _sigma * stddev_estimate_for_small_sample_sets(seq));
  59   }

I think a negative _sigma would produce unexpected results.  I suspect
it's not supposed to ever be negative though.  If so, an assert in the
constructor would help.

I think a negative seq->davg() would produce unexpected results when
the sample size is small, since the estimated stddev would be
negative.  I don't know of any reason why the entries in seq couldn't
have negative values, possibly leading to a negative davg.  I'm not
sure what should be done in that situation.

And I had the same question as Sangheon about seq possibly being
empty.  I think that's a shouldn't happen, since there would be
nothing from which to produce davg either.  But perhaps an assert is
called for.

Ignoring those issues for the moment, I think it would be clearer as
something like

  double stddev_estimate(TruncatedSeq const* seq) const {
    double estimate = seq->dsd();
    int const samples = seq->num();
    // Estimate for small sample sets.
    if (samples < 5) {
      estimate = MAX2(estimate, seq->davg() * (5 - samples) / 2.0);
    }
    return estimate;
  }

  double get_new_predication(TruncatedSeq const* seq) const {
    return seq->davg() + _sigma * stddev_estimate(seq);
  }

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1Predictions.hpp 
src/share/vm/gc/g1/g1Predictions.cpp 
 61   static void test();

Declaration and definition should be NOT_PRODUCT-only.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1CollectorPolicy.hpp
 168   double get_new_prediction(TruncatedSeq* const seq) const;

Parameter should be "TruncatedSeq const*".

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1CollectorPolicy.hpp
 214   enum PredictionConstants {
 215     TruncatedSeqLength = 10,
 216     NumPrevPausesForHeuristics = 10
 217   };

Pre-existing stylistic issue - feel free to ignore.

I don't think enums like this serve any useful purpose. The values are
not actually related in any interesting semantic way; as evidence,
note that the enum tag is never used. Also, the types of the
enumerators are limited and determined automatically by the language,
and in this case are int, even though an unsigned type is likely more
appropriate for these values that are used as sizes or counts. Of
course, in this case the place where these constants are used is also
of type int (and with a similarly non-useful enum), even though
negative values are inappropriate there too.

I would use static constants of an appropriate type instead of an enum
here, e.g. something like

  static const unsigned TruncatedSeqLength = 10;

But such a simple change may get one into a warnings mess over mixed
signed / unsigned operations that would be far out of scope for the
change under review.

------------------------------------------------------------------------------  
src/share/vm/gc/g1/g1CollectorPolicy.cpp 
 468   if ((2.0 /* magic */ * _predictor.sigma()) * (double) bytes_to_copy > (double) free_bytes) {

1309   return (double) pending_cards * predict_cost_per_card_ms() + predict_scan_hcc_ms();

1325   return (size_t) ((double) rs_length * predict_young_cards_per_entry_ratio());
1329   return (size_t)((double) rs_length * predict_mixed_cards_per_entry_ratio());
... and a bunch more following ...

Pre-existing stylistic issue - feel free to ignore.

I find the explicit casts (esp. to double) to be unnecessary clutter.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1CollectorPolicy.cpp 
1388   if (seq->num() == 0) {
1389     gclog_or_tty->print("BARF! age is %d", age);
1390   }
1391   guarantee(seq->num() > 0, "invariant" );

Pre-existing.

Remove the conditional logging and replace the guarantee's error message
with the previously logged message.  (Though maybe remove “BARF!” :-)

------------------------------------------------------------------------------
src/share/vm/gc/g1/survRateGroup.cpp
src/share/vm/gc/g1/survRateGroup.hpp

The policy object in SurvRateGroup::_g1p is only used to obtain the
predictor.  Perhaps _g1p should be changed to be a reference to the
predictor instead? 

------------------------------------------------------------------------------




More information about the hotspot-gc-dev mailing list