IMPT: Better dead-code elimination avoidance
Aleksey Shipilev
aleksey.shipilev at oracle.com
Fri May 10 10:52:39 PDT 2013
Hi,
TL;DR: if you measure the benchmarks for the effects within ~100ns, you
need to update JMH and re-validate your experiments.
During the pending performance work, we figured the oddity in JMH
results for very fine nano-benchmarks, where the scores can fling 2-3x
with innocuous changes. We triaged to be the false sharing issue on our
Blackhole class, which is responsible for evading the dead code elimination.
Even though Blackhole is padded, it still can collide with other
actively updated objects. The padding within the Blackhole can only
guarantee the collision avoidance against another Blackhole.
So, with headlines like that, I pushed the renewed version of Blackhole,
immune to these issues. The implementation notes can be found here:
http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/logic/BlackHole.java
As the benefit, we also provide consistent performance for consuming any
data type:
> Benchmark Thr Cnt Sec Mean Mean error Var Units
> t.g.a.BlackholeBench.baseline 16 10 1 0.373 0.009 0.000 nsec/op
> t.g.a.BlackholeBench.explicit_testArray 16 10 1 1.343 0.008 0.000 nsec/op
> t.g.a.BlackholeBench.explicit_testBoolean 16 10 1 1.318 0.020 0.000 nsec/op
> t.g.a.BlackholeBench.explicit_testByte 16 10 1 1.307 0.014 0.000 nsec/op
> t.g.a.BlackholeBench.explicit_testChar 16 10 1 1.308 0.005 0.000 nsec/op
> t.g.a.BlackholeBench.explicit_testDouble 16 10 1 1.487 0.015 0.000 nsec/op
> t.g.a.BlackholeBench.explicit_testFloat 16 10 1 1.590 0.030 0.001 nsec/op
> t.g.a.BlackholeBench.explicit_testInt 16 10 1 1.354 0.022 0.000 nsec/op
> t.g.a.BlackholeBench.explicit_testLong 16 10 1 1.346 0.023 0.001 nsec/op
> t.g.a.BlackholeBench.explicit_testObject 16 10 1 1.348 0.011 0.000 nsec/op
> t.g.a.BlackholeBench.implicit_testArray 16 10 1 1.457 0.020 0.000 nsec/op
> t.g.a.BlackholeBench.implicit_testBoolean 16 10 1 1.313 0.027 0.001 nsec/op
> t.g.a.BlackholeBench.implicit_testByte 16 10 1 1.305 0.007 0.000 nsec/op
> t.g.a.BlackholeBench.implicit_testChar 16 10 1 1.305 0.003 0.000 nsec/op
> t.g.a.BlackholeBench.implicit_testDouble 16 10 1 1.489 0.011 0.000 nsec/op
> t.g.a.BlackholeBench.implicit_testFloat 16 10 1 1.573 0.005 0.000 nsec/op
> t.g.a.BlackholeBench.implicit_testInt 16 10 1 1.314 0.008 0.000 nsec/op
> t.g.a.BlackholeBench.implicit_testLong 16 10 1 1.336 0.028 0.001 nsec/op
> t.g.a.BlackholeBench.implicit_testObject 16 10 1 1.346 0.015 0.000 nsec/op
Please update and re-validate your experiments! It is especially
important for nano-benchmarks measuring the effects up to 100ns, and
virtually all of multi-threaded benchmarks.
-Aleksey.
More information about the jmh-dev
mailing list