RFR (M): 8137082: Factor out G1 prediction code from G1CollectorPolicy and clean up

Wed Oct 14 15:44:33 UTC 2015

Hi Kim,

  thanks for the review.

On Tue, 2015-10-13 at 18:57 -0400, Kim Barrett wrote:
> On Oct 13, 2015, at 8:26 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> > 
> > http://cr.openjdk.java.net/~tschatzl/8137082/webrev.2/ (full)
> > http://cr.openjdk.java.net/~tschatzl/8137082/webrev.0_to_2/ (diff) 
> 
> ------------------------------------------------------------------------------
> 
> I'd prefer that significant numbers of formatting changes were done
> separately from other kinds of changes. For example, I'd rather have
> the signature reformattings in src/share/vm/gc/g1/survRateGroup.cpp
> done as a separate change, to make that easier to review and to avoid
> cluttering functional / refactoring changes with unrelated stuff.

They did not feel like a significant overhead to me, but okay, I will
create a new CR/webrev (JDK-8139583) for them.

> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/g1Predictions.hpp 
>   42   double stddev_estimate_for_small_sample_sets(TruncatedSeq const* seq) const {
>   43     int const samples = seq->num();
>   44     if (samples > 4) {
>   45       return 0.0;
>   46     } else {
>   47       return seq->davg() * (5 - samples) / 2.0;
>   48     }
>   49   }
> 
>   56   double get_new_prediction(TruncatedSeq const* seq) const {
>   57     return MAX2(seq->davg() + _sigma * seq->dsd(),
>   58                 seq->davg() + _sigma * stddev_estimate_for_small_sample_sets(seq));
>   59   }
> 
> I think a negative _sigma would produce unexpected results.  I suspect
> it's not supposed to ever be negative though.  If so, an assert in the
> constructor would help.

Added. Note that the confidence factor passed to the constructor
(G1ConfidencePercent) is always positive.

> I think a negative seq->davg() would produce unexpected results when
> the sample size is small, since the estimated stddev would be
> negative.  I don't know of any reason why the entries in seq couldn't
> have negative values, possibly leading to a negative davg.  I'm not
> sure what should be done in that situation.

We would get negative predictions, which may confuse the estimations.
This would be a bug in the sampling then.

I think it would be more appropriate to do this checking when adding the
values to the sequences (like adding a TimeSequence or such that only
allows adding positive samples).

> And I had the same question as Sangheon about seq possibly being
> empty.  I think that's a shouldn't happen, since there would be
> nothing from which to produce davg either.  But perhaps an assert is
> called for.

davg will be 0.0 as per default value of the sequence member, which
makes the entire term zero.

I temporarily added the assert, but as I suspected there is code that
(correctly) expects that the return value for this case is zero. It is
a non-trivial effort to fix this existing issue, so I created a new CR
(JDK-8139594).

> Ignoring those issues for the moment, I think it would be clearer as
> something like
> 
>   double stddev_estimate(TruncatedSeq const* seq) const {
>     double estimate = seq->dsd();
>     int const samples = seq->num();
>     // Estimate for small sample sets.
>     if (samples < 5) {
>       estimate = MAX2(estimate, seq->davg() * (5 - samples) / 2.0);
>     }
>     return estimate;
>   }
> 
>   double get_new_predication(TruncatedSeq const* seq) const {
>     return seq->davg() + _sigma * stddev_estimate(seq);
>   }

Done.

> 
> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/g1Predictions.hpp 
> src/share/vm/gc/g1/g1Predictions.cpp 
>  61   static void test();
> 
> Declaration and definition should be NOT_PRODUCT-only.
> 

Done

> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/g1CollectorPolicy.hpp
>  168   double get_new_prediction(TruncatedSeq* const seq) const;
> 
> Parameter should be "TruncatedSeq const*".

Done

> 
> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/g1CollectorPolicy.hpp
>  214   enum PredictionConstants {
>  215     TruncatedSeqLength = 10,
>  216     NumPrevPausesForHeuristics = 10
>  217   };
> 
> Pre-existing stylistic issue - feel free to ignore.
> 
> I don't think enums like this serve any useful purpose. The values are
> not actually related in any interesting semantic way; as evidence,
> note that the enum tag is never used. Also, the types of the
> enumerators are limited and determined automatically by the language,
> and in this case are int, even though an unsigned type is likely more
> appropriate for these values that are used as sizes or counts. Of
> course, in this case the place where these constants are used is also
> of type int (and with a similarly non-useful enum), even though
> negative values are inappropriate there too.
> 
> I would use static constants of an appropriate type instead of an enum
> here, e.g. something like
> 
>   static const unsigned TruncatedSeqLength = 10;
> 
> But such a simple change may get one into a warnings mess over mixed
> signed / unsigned operations that would be far out of scope for the
> change under review.

Since this is a common pattern in the existing code, I will keep it as
is, as a future cleanup.

I do agree with the rest of your comments.

> 
> ------------------------------------------------------------------------------  
> src/share/vm/gc/g1/g1CollectorPolicy.cpp 
>  468   if ((2.0 /* magic */ * _predictor.sigma()) * (double) bytes_to_copy > (double) free_bytes) {
> 
> 1309   return (double) pending_cards * predict_cost_per_card_ms() + predict_scan_hcc_ms();
> 
> 1325   return (size_t) ((double) rs_length * predict_young_cards_per_entry_ratio());
> 1329   return (size_t)((double) rs_length * predict_mixed_cards_per_entry_ratio());
> ... and a bunch more following ...
> 
> Pre-existing stylistic issue - feel free to ignore.
> 
> I find the explicit casts (esp. to double) to be unnecessary clutter.

Done.

> 
> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/g1CollectorPolicy.cpp 
> 1388   if (seq->num() == 0) {
> 1389     gclog_or_tty->print("BARF! age is %d", age);
> 1390   }
> 1391   guarantee(seq->num() > 0, "invariant" );
> 
> Pre-existing.
> 
> Remove the conditional logging and replace the guarantee's error message
> with the previously logged message.  (Though maybe remove “BARF!” :-)

Done.

> ------------------------------------------------------------------------------
> src/share/vm/gc/g1/survRateGroup.cpp
> src/share/vm/gc/g1/survRateGroup.hpp
> 
> The policy object in SurvRateGroup::_g1p is only used to obtain the
> predictor.  Perhaps _g1p should be changed to be a reference to the
> predictor instead? 
> 
> ------------------------------------------------------------------------------

Done.

New webrev at

http://cr.openjdk.java.net/~tschatzl/8137082/webrev.3/ (full)
http://cr.openjdk.java.net/~tschatzl/8137082/webrev.2_to_3/ (diff) 

Thanks,
  Thomas