FR: summary stats to detect multi-modal behaviour of benchmarks

Mon Sep 8 08:37:49 UTC 2014

On 09/08/2014 11:53 AM, Nitsan Wakart wrote:
> Hi, In my work I come across benchmarks that have 'modes' of
> performance. For instance a benchmark exercising code which suffers
> from false sharing usually has a probability of that false sharing
> manifesting. The run to run variance is very indicative of these sort
> of issues when they happen, but the summary of all the runs put
> together hides this behaviour from me. I've hit similar issues around
> unstable compilation results and timing sensitive benchmarks. 

So do we, but are there reliable statistical tests which can tell the
multi-modality, even if the baseline distribution is known? One can go
the other way around, and test for normality...

> Currently I parse the output by hand/script to detect these anomalies
> and look out for any large error indicators in the summary, but I was
> wondering if we can make the summary statistics pluggable to better
> detect these behaviours, or if perhaps other people have better
> solutions to this problem.

Well, pluggable how? Java API provides the access to raw statistical
data today. You can create your own runner to use the Java API and do
the results post-processing as you want.

I would like to see those statistic analysis approaches implemented
using Java API; then we can figure out if something is sensible to add
in the mainline JMH.

Thanks,
-Aleksey.