RFR (M): 8137082: Factor out G1 prediction code from G1CollectorPolicy and clean up

Wed Oct 14 18:54:36 UTC 2015

On Oct 14, 2015, at 11:44 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> On Tue, 2015-10-13 at 18:57 -0400, Kim Barrett wrote:
>> On Oct 13, 2015, at 8:26 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>> 
>>> http://cr.openjdk.java.net/~tschatzl/8137082/webrev.2/ (full)
>>> http://cr.openjdk.java.net/~tschatzl/8137082/webrev.0_to_2/ (diff) 
>> ------------------------------------------------------------------------------
>> src/share/vm/gc/g1/g1Predictions.hpp 
>>  42   double stddev_estimate_for_small_sample_sets(TruncatedSeq const* seq) const {
>>  43     int const samples = seq->num();
>>  44     if (samples > 4) {
>>  45       return 0.0;
>>  46     } else {
>>  47       return seq->davg() * (5 - samples) / 2.0;
>>  48     }
>>  49   }
>> 
>>  56   double get_new_prediction(TruncatedSeq const* seq) const {
>>  57     return MAX2(seq->davg() + _sigma * seq->dsd(),
>>  58                 seq->davg() + _sigma * stddev_estimate_for_small_sample_sets(seq));
>>  59   }
>> 
>> I think a negative _sigma would produce unexpected results.  I suspect
>> it's not supposed to ever be negative though.  If so, an assert in the
>> constructor would help.
> 
> Added. Note that the confidence factor passed to the constructor
> (G1ConfidencePercent) is always positive.

Thanks.

>> I think a negative seq->davg() would produce unexpected results when
>> the sample size is small, since the estimated stddev would be
>> negative.  I don't know of any reason why the entries in seq couldn't
>> have negative values, possibly leading to a negative davg.  I'm not
>> sure what should be done in that situation.
> 
> We would get negative predictions, which may confuse the estimations.
> This would be a bug in the sampling then.
> 
> I think it would be more appropriate to do this checking when adding the
> values to the sequences (like adding a TimeSequence or such that only
> allows adding positive samples).

OK.

>> And I had the same question as Sangheon about seq possibly being
>> empty.  I think that's a shouldn't happen, since there would be
>> nothing from which to produce davg either.  But perhaps an assert is
>> called for.
> 
> davg will be 0.0 as per default value of the sequence member, which
> makes the entire term zero.
> 
> I temporarily added the assert, but as I suspected there is code that
> (correctly) expects that the return value for this case is zero. It is
> a non-trivial effort to fix this existing issue, so I created a new CR
> (JDK-8139594).

OK.

>> src/share/vm/gc/g1/g1CollectorPolicy.hpp
>> 214   enum PredictionConstants {
>> 215     TruncatedSeqLength = 10,
>> 216     NumPrevPausesForHeuristics = 10
>> 217   };
>> 
>> Pre-existing stylistic issue - feel free to ignore.
>> 
>> I don't think enums like this serve any useful purpose. The values are
>> not actually related in any interesting semantic way; as evidence,
>> note that the enum tag is never used. Also, the types of the
>> enumerators are limited and determined automatically by the language,
>> and in this case are int, even though an unsigned type is likely more
>> appropriate for these values that are used as sizes or counts. Of
>> course, in this case the place where these constants are used is also
>> of type int (and with a similarly non-useful enum), even though
>> negative values are inappropriate there too.
>> 
>> I would use static constants of an appropriate type instead of an enum
>> here, e.g. something like
>> 
>>  static const unsigned TruncatedSeqLength = 10;
>> 
>> But such a simple change may get one into a warnings mess over mixed
>> signed / unsigned operations that would be far out of scope for the
>> change under review.
> 
> Since this is a common pattern in the existing code, I will keep it as
> is, as a future cleanup.
> 
> I do agree with the rest of your comments.

A little bit of investigation seems to indicate that older Sun compilers had problems
with static data member initializations, and enums are the usual workaround for such. 
So it seems there’s good historical reason for this pattern.

> New webrev at
> 
> http://cr.openjdk.java.net/~tschatzl/8137082/webrev.3/ (full)
> http://cr.openjdk.java.net/~tschatzl/8137082/webrev.2_to_3/ (diff) 

Looks good, other than a couple minor issues.  I don't think I need a
new webrev for any of these.

------------------------------------------------------------------------------  
src/share/vm/gc/g1/g1CollectorPolicy.cpp 
 155   // SurvRateGroups below must be initialized after '_sigma' because they
 156   // indirectly access '_sigma' through this object passed to their constructor.

Pre-existing.  Maybe I'm wrong, but I don't see any sign of this being
true either before or after this set of changes.  Maybe there have
been other changes since that comment was added 8 months ago (by me!)
that made it no longer so?

------------------------------------------------------------------------------  
src/share/vm/gc/g1/g1CollectorPolicy.hpp 

Are there any remaining uses of G1CollectorPolicy::sigma or _sigma?
It looks like these might be dead after your other changes.  And if
sigma() *is* still needed, it probably should be obtained from the
_predictor rather than duplicating information there.

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1Predictions.hpp
  52     assert(sigma >= 0.0, "Confidence must be larger than zero");

"larger than zero" => "at least zero" or "non-negative" or something.

------------------------------------------------------------------------------