RFR (S/M): 8136678: Implement adaptive sizing algorithm for IHOP

Wed Nov 25 10:19:24 UTC 2015

Hi Thomas,

Sorry for being late to the review party, but I think I might have found 
some issues with these calculations.

On 2015-11-23 11:28, Thomas Schatzl wrote:
[...]
>>>> g1IHOPControl.cpp
>>>>
>>>>     119   _marking_times_s(10, 0.95),
>>>>     120   _allocation_rate_s(10, 0.95),
>>>> Would you mind making these named constants? The 10 is the same as the
>>>> truncated seq length in G1CollectorPolicy but the 0.95 is not the
>>>> standard "alpha" value, would it make sense to describe what 0.95 comes
>>>> from?
>>>
>>> Internal testing showed that it is good to strongly favor most recent
>>> samples. I will make this an experimental flag.
>>
>> I think a comment would suffice instead of a flag.
>>
>> // 0.95 favors more recent samples
>>   >>    119   _marking_times_s(10, 0.95),
>>   >>    120   _allocation_rate_s(10, 0.95),

I think 0.95 should be (1.0 - 0.95) for it to mean what you want here. 
In TrancatedSeq, an alpha of 0.95 means give the latest sample a weight 
of 0.05. The calculation for the decaying average in AbsSeq looks like 
this, where val is the latest sample.

     _davg = (1.0 - _alpha) * val + _alpha * _davg;

Just to be sure I didn't misread the code, I tried this:

   TruncatedSeq seq(3, 0.95);
   seq.add(2.0);
   seq.add(3.0);
   seq.add(4.0);
   seq.dump();

which gives this:

	 _num = 3, _sum =   9.000, _sum_of_squares =  29.000
	 _davg =   2.147, _dvariance =   0.214, _alpha =   0.950
		 _length = 3, _next = 0

		[0]=  2.000	[1]=  3.000	[2]=  4.000

and this:

  TruncatedSeq seq(3, 0.05);
  seq.add(2.0);
  seq.add(3.0);
  seq.add(4.0);
  seq.dump();

gives this:

	 _num = 3, _sum =   9.000, _sum_of_squares =  29.000
	 _davg =   3.947, _dvariance =   0.003, _alpha =   0.050
		 _length = 3, _next = 0

		[0]=  2.000	[1]=  3.000	[2]=  4.000

and that seems to confirm that you're not favoring the latest sample as 
the comment in the code suggests.

> New webrev at
> http://cr.openjdk.java.net/~tschatzl/8136678/webrev.2_to_3/ (diff)
> http://cr.openjdk.java.net/~tschatzl/8136678/webrev.3/ (full)

Another thing that looks odd here is how G1ConfidencePercent plays into 
these calculations.

The predictor is initialized with:

   _predictor(G1ConfidencePercent / 100.0),

Let's assume G1ConfidencePercent=99, which (to me at least) means that 
our predictions should be correct in 99% of the cases. This means 
_predictor->_sigma becomes 0.99 and a prediction is calculated as:

   double get_new_prediction(TruncatedSeq const* seq) const {
     return seq->davg() + _sigma * stddev_estimate(seq);
   }

However, to have a 99% confidence level in the prediction the standard 
deviation needs to be multiplied by ~2.576. Multiplying with 0.99 only 
gives you a confidence level of ~32%.

(btw, I think _sigma in G1Predictor is misnamed, as that usually 
represents stddev, which is returned by stddev_estimate(). The _sigma 
here should probably be called _confidence_factor or something similar)

cheers,
/Per