IMPT: Better dead-code elimination avoidance

Fri May 10 10:52:39 PDT 2013

Hi,

TL;DR: if you measure the benchmarks for the effects within ~100ns, you
need to update JMH and re-validate your experiments.

During the pending performance work, we figured the oddity in JMH
results for very fine nano-benchmarks, where the scores can fling 2-3x
with innocuous changes. We triaged to be the false sharing issue on our
Blackhole class, which is responsible for evading the dead code elimination.

Even though Blackhole is padded, it still can collide with other
actively updated objects. The padding within the Blackhole can only
guarantee the collision avoidance against another Blackhole.

So, with headlines like that, I pushed the renewed version of Blackhole,
immune to these issues. The implementation notes can be found here:
http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/logic/BlackHole.java

As the benefit, we also provide consistent performance for consuming any
data type:

> Benchmark                                   Thr    Cnt  Sec         Mean   Mean error          Var    Units
> t.g.a.BlackholeBench.baseline                16     10    1        0.373        0.009        0.000  nsec/op
> t.g.a.BlackholeBench.explicit_testArray      16     10    1        1.343        0.008        0.000  nsec/op
> t.g.a.BlackholeBench.explicit_testBoolean    16     10    1        1.318        0.020        0.000  nsec/op
> t.g.a.BlackholeBench.explicit_testByte       16     10    1        1.307        0.014        0.000  nsec/op
> t.g.a.BlackholeBench.explicit_testChar       16     10    1        1.308        0.005        0.000  nsec/op
> t.g.a.BlackholeBench.explicit_testDouble     16     10    1        1.487        0.015        0.000  nsec/op
> t.g.a.BlackholeBench.explicit_testFloat      16     10    1        1.590        0.030        0.001  nsec/op
> t.g.a.BlackholeBench.explicit_testInt        16     10    1        1.354        0.022        0.000  nsec/op
> t.g.a.BlackholeBench.explicit_testLong       16     10    1        1.346        0.023        0.001  nsec/op
> t.g.a.BlackholeBench.explicit_testObject     16     10    1        1.348        0.011        0.000  nsec/op
> t.g.a.BlackholeBench.implicit_testArray      16     10    1        1.457        0.020        0.000  nsec/op
> t.g.a.BlackholeBench.implicit_testBoolean    16     10    1        1.313        0.027        0.001  nsec/op
> t.g.a.BlackholeBench.implicit_testByte       16     10    1        1.305        0.007        0.000  nsec/op
> t.g.a.BlackholeBench.implicit_testChar       16     10    1        1.305        0.003        0.000  nsec/op
> t.g.a.BlackholeBench.implicit_testDouble     16     10    1        1.489        0.011        0.000  nsec/op
> t.g.a.BlackholeBench.implicit_testFloat      16     10    1        1.573        0.005        0.000  nsec/op
> t.g.a.BlackholeBench.implicit_testInt        16     10    1        1.314        0.008        0.000  nsec/op
> t.g.a.BlackholeBench.implicit_testLong       16     10    1        1.336        0.028        0.001  nsec/op
> t.g.a.BlackholeBench.implicit_testObject     16     10    1        1.346        0.015        0.000  nsec/op

Please update and re-validate your experiments! It is especially
important for nano-benchmarks measuring the effects up to 100ns, and
virtually all of multi-threaded benchmarks.

-Aleksey.