Blackhole.consume(Object) has different semantics to Blackhole.consume(primitive)
Aleksey Shipilev
aleksey.shipilev at oracle.com
Wed Nov 12 12:49:17 UTC 2014
Hi Nitsan,
On 11/12/2014 01:46 AM, Nitsan Wakart wrote:
> I'm not saying one is better than the other, or that the unlikely
> deoptimization is a massive issue, but the slight semantic
> differences can lead to surprising effects caused by switching from
> one consume method to the other. Can we settle on a method? is there
> a good reason to maintain special treatment?
There is a choice between performance and consistency. Primitive
consumes avoid the writes completely, and that's their benefit.
Reference consumes cannot employ the same trick, and so they do the
second best option: PRNG-predicated heap write.
Given the code that produces either primitives or references is already
quite different, it feels odd to trade in the performance for already
broken consistency. This is how much you will pay for consistency:
$ java -jar jmh-core-benchmarks/target/benchmarks.jar BlackholeBench -wi
5 -i 5 -f 1
x86_64, i7-4790K:
Benchmark Mode Samples Score Error Units
baseline avgt 5 0.252 ± 0.002 ns/op
implicit_testArray avgt 5 2.271 ± 0.049 ns/op
implicit_testBoolean avgt 5 0.629 ± 0.009 ns/op
implicit_testByte avgt 5 0.629 ± 0.005 ns/op
implicit_testChar avgt 5 0.631 ± 0.005 ns/op
implicit_testDouble avgt 5 0.747 ± 0.032 ns/op
implicit_testFloat avgt 5 0.678 ± 0.015 ns/op
implicit_testInt avgt 5 0.629 ± 0.005 ns/op
implicit_testLong avgt 5 0.633 ± 0.013 ns/op
implicit_testObject avgt 5 2.267 ± 0.037 ns/op
implicit_testShort avgt 5 0.629 ± 0.002 ns/op
That is, reference consumes cost 3x-4x more than primitive ones on x86.
Switching to PRNG-predicated writes in primitive cases seem odd with
data like that.
ARMv7, Cortex-A9:
Benchmark Mode Samples Score Error Units
baseline avgt 5 5.291 ± 0.000 ns/op
implicit_testArray avgt 5 11.757 ± 0.001 ns/op
implicit_testBoolean avgt 5 13.524 ± 0.002 ns/op
implicit_testByte avgt 5 14.109 ± 0.001 ns/op
implicit_testChar avgt 5 13.521 ± 0.001 ns/op
implicit_testDouble avgt 5 14.109 ± 0.001 ns/op
implicit_testFloat avgt 5 14.110 ± 0.011 ns/op
implicit_testInt avgt 5 13.524 ± 0.002 ns/op
implicit_testLong avgt 5 18.815 ± 0.007 ns/op
implicit_testObject avgt 5 11.757 ± 0.001 ns/op
implicit_testShort avgt 5 14.109 ± 0.001 ns/op
ARM actually ends up more or less consistent because the costs of
volatile reads in primitive cases compensate the cost of PRNG writes in
reference cases.
If you are concerned with the absence of volatile reads for reference
consumes, we may add the volatile "spoiler" there to get the same
effect. That will break the perceived consistency from ARM case -- seems
to be the lesser evil.
Thanks,
-Aleksey.
More information about the jmh-dev
mailing list