Mean error bigger than mean?

Fri Apr 11 16:28:07 UTC 2014

Hi Kirk and Aleksey,

Thanks for quick reply.

I would like to provide some background on the original benchmark. What I wanted to measure was different libraries which implement Map with long as a key and Object as value, i.e primitive key (there are multiple libraries such as Trove, fastutils etc.). As a baseline I decided to use HashMap<Long, Object> to see how those other libraries compare with code built using plain JDK classes. 
Naturally I was running benchmark on both JDK 7 & 8. And on JDK 8 baseline measurement for java.util.HashMap<Long, Object> exploded. At first I was suspecting HashMap changes in JDK8 but then I managed to reproduce this issue by simplified benchmark which I posted in my first email to this list (i.e. It simply forces autoboxing on the array of longs). That bench consistently shows the following:
- without creating additional state (i.e. populating "hugeMap" variable) both JDKs show similar results
- with extra state which is not used in measurements, I got same results for JDK7 and "explosion" on JDK8

As was suggested by Aleksey I'll re-run with explicit heap size (e.g. 4g) and using -gc true flag to JMH. Unfortunately I'll be able to do it on Monday evening. However I don't think it will eliminate the issue as I was trying this with original (full) bench already. If I'm right next step would be grab GC logs from both runs and compare them.

Regards,
Dmitry

Sent from my iPhone

> On Apr 11, 2014, at 12:02, Kirk Pepperdine <kirk at kodewerk.com> wrote:
> 
> 
>> On Apr 10, 2014, at 8:40 PM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
>> 
>>> On 04/10/2014 10:00 PM, Dmitry Vyazelenko wrote:
>>> I came across interesting issue with one benchmark while testing it
>>> on JDK8. Basically I observe measurement error bigger than measured
>>> value. And I’m not sure if it is somehow JMH issue or JDK8 regression
>>> or something else. I hope you can give me a hint on what is going
>>> on.
>> 
>> Well, the mere fact the error is larger than the mean itself does not
>> strike me as the correctness issue.
> 
> I fear this is a correctness issue. Large variations in results means the threads executing the algorithm/workload you’re measuring were being interrupted or the unit of work wasn’t a constant meaning that it most likely your result set is a mix of different result sets. Either way, with large error or variation in the measurements.. you can’t use the measurements until you sort out the source of the variance and see if you can smoothen things out.
> 
>> I think your benchmark is GC bound,
> 
> I would concur… as this can be a huge source of variance. Now the question is, should this be a part of your measurement.. or not...
> 
> Regards,
> Kirk
>