Warmup Iterations

Thu Jan 9 06:52:27 PST 2014

Hi Richard,

On 01/03/2014 12:47 AM, Richard Warburton wrote:
>>From brief experimentation JMH uses a fixed number of warmup iterations
> which can be specified as a commandline parameter.  What would be
> preferable from a user's perspective is continuing to run warmup iterations
> until the times stabilise, or at least an option to do so. IMHO this is a
> sensible default for running warmup iterations
> 
> Similarly for iterations, it would be nice if it were possible to run
> enough iterations for some confidence interval of the claimed difference in
> performance between the results.  Or perhaps just dumping out the P-value
> or bayes number implied by the number of samples collected?
> 
> Perhaps there is already something along these lines and I've just missed
> it, but I didn't see it in the documentation.

It is the deliberate choice to avoid automatic guidance on
warmup/measurement durations, because you can not generally figure out
whether you hit steady state or not. That "stabilization" metric is very
hard, if not impossible, to reliably define.

And yes, that also includes the statistical measures: cutting off the
warmup/measurement when the given "confidence" is reached works if: a)
the iteration scores are normally distributed -- this is usually true
only for very small throughput benchmarks due to CLT; b) the iteration
scores are unimodal -- this is surprisingly not true for many
benchmarks; c) there are no outliers.

If any of these fail, measuring for the static amount of time is the
lesser evil. So, instead of reinforcing the wishful thinking of users
that benchmark is ideal, we rather force users to have same-length
iterations for all the experiments, with the option to make quicker
benchmarks *after* they get the confidence their workloads behave
properly in short runs: i.e. there are no transients hijacking the short
runs.

-Aleksey.