How to assess impact of specializations on warmup

Fri Sep 26 10:03:51 UTC 2014

On 26 Sep 2014, at 09:11, Stefan Marr <java at stefan-marr.de> wrote:

> Hi:
> 
> I was wondering whether you guys have a way of assessing the impact of optimizations and/or specializations on warmup.
> 
> I was thinking of a simple approach based on the observed peak performance.
> So, assuming that each benchmark iteration does a constant amount of work (minus optimizations), perhaps taking the index of the first iteration that is within an error margin of the peak performance would be a good proxy for the time it takes to warmup.

My warmup strategy at the moment is to look in a sliding window for the first n iterations where the relative range is less than some constant k. These first n iterations are where I start sampling, so if the relative range is low immediately we could have ‘no warmup’. This seems to work well - in that it does seem to capture all the compilation (both in Graal and LLVM etc).

> However, I also see on a number of benchmarks that there are later compilations happening, that do not necessarily contribute to the peak performance.

I used to worry about this as well, but really Graal is a dynamic compiler and it can run anytime it wants to. I now don’t look at compilation to check for warmup - I just look at iteration times. As long as I’m not running a parallelism benchmark and my warmup test passes I don’t worry about extra compilation happening in the background on another core.

> Some benchmarks even have one fast run very early after some compilation. Then however further compilation happens that degrades performance again and it takes a while to reach the peak performance again.
> So, well, I am not sure whether just taking the first iteration which is close to peak is a good idea.
> On the other hand, saying peak is reached when it was observed for 20 or so iterations seems also to be a little arbitrary.

I have also seen this behaviour with Ruby. I don’t have a good explanation for it at the moment, beyond that as your program warms up you will have a different set of compiled and interpreted methods and the interaction between these as they call each other and use frames from each other will change with different runtime costs.

> Is there perhaps literature on the topic? So far I have mostly seen people reporting startup time by measuring a first iteration. However, that isn’t really what I am interested in, since I want to know what the impact of certain specializations is on the warmup behavior.

You’ve probably heard me say this before, but as far as I know the only sensible methodical approach to benchmarking is Kalibera and Jones, http://kar.kent.ac.uk/33611/. I won’t pretend to understand all their statistics work, but they recommend that to check warmup you really need to manually look for patterns in lag and autocorrelation plots. I use my sliding-window warmup technique on benchmark servers, but if I’m publishing a paper I draw these plots and check them manually for the runs I’m using in the publication.

So basically - if you’re looking for a good simple solution, I don’t think there is one!

> Thanks
> Stefan
> 
> 
> 
> -- 
> Stefan Marr
> INRIA Lille - Nord Europe
> http://stefan-marr.de/research/
> 
> 
>