RFR: Do not reset learning cycles after resizing

Wed Jan 18 00:06:29 UTC 2023

On Tue, 17 Jan 2023 23:27:24 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:

>> Also, require more than 10 gc cycles before trigger can resize generations. The behavior on the extremem 'phased' workload looks much better:
>> 
>> [1334.381s][info][gc,stats       ]    66 Successful Concurrent GCs
>> [1334.381s][info][gc,stats       ]       0 invoked explicitly
>> [1334.381s][info][gc,stats       ]       0 invoked implicitly
>> [1334.381s][info][gc,stats       ] 
>> [1334.381s][info][gc,stats       ]     7 Completed Old GCs
>> [1334.381s][info][gc,stats       ]       0 mixed
>> [1334.381s][info][gc,stats       ]       0 interruptions
>> [1334.381s][info][gc,stats       ] 
>> [1334.381s][info][gc,stats       ]     1 Degenerated GCs
>> [1334.381s][info][gc,stats       ]       1 caused by allocation failure
>> [1334.381s][info][gc,stats       ]         1 happened at Mark
>> [1334.381s][info][gc,stats       ]       1 upgraded to Full GC
>> [1334.381s][info][gc,stats       ] 
>> [1334.381s][info][gc,stats       ]     0 Abbreviated GCs
>> [1334.381s][info][gc,stats       ] 
>> [1334.381s][info][gc,stats       ]     1 Full GCs
>> [1334.381s][info][gc,stats       ]       0 invoked explicitly
>> [1334.381s][info][gc,stats       ]       0 invoked implicitly
>> [1334.381s][info][gc,stats       ]       0 caused by allocation failure
>> [1334.381s][info][gc,stats       ]       1 upgraded from Degenerated GC
>> 
>> The full cycle here was the first cycle after the last of the initial learning cycles.
>
> LGTM, reviewed. But curious if you noticed any difference wrt, e.g., specjbb.
> 
> Some more thoughts:
> I wonder if the # of cycles to wait would be proportional to the long-run ratio of minor to major collection cycles.
> 
> I agree though that waiting about 10 cycles between resizing decisions would have the salubrious effect of smoothing out any temporary spikes. Somewhat relatedly, and something that I only vaguely paid attention to before: What's the default decay factor for the MMU decaying average, and what constitutes an MMU sample: the occurrence of a GC (minor or major), or just a synchronous 5-second sample of both (which might decay very quickly go to 100%, losing almost all the information in the signal after 6-7 samples, i.e. 30-35 seconds in this case, unless GC's were happening at a fast clip).

@ysramakrishna , I didn't notice the trouble with these learning cycles originally because on specjbb, the heuristic quickly maxes out the young generation size and keeps it there. The decay factor is set by `ShenandoahAdaptiveDecayFactor` (default is 0.5) and the MMU is updated every `GCPauseIntervalMillis` (default is 5 seconds).

@kdnilsen , I will look for workloads that defeat the heuristic. It's much better behaved now on the extremem 'phased' workload. Perhaps heapothesys with a high occupancy rate?

-------------

PR: https://git.openjdk.org/shenandoah/pull/203