Any option to use something other than "time" to measure benchmarks?
Travis Downs
travis.downs at gmail.com
Mon Jul 25 00:07:18 UTC 2016
Currently JMH uses "time" as the one part of the metric for all modes, as
far as I can tell (e.g., ops/second, seconds/op, etc).
This has the big downside of being wildly affected by all the fancy things
that OSes and CPUs do these days. In particular, on the CPU end, you have:
- downscaling of the CPU frequency due to thermal measurements, instruction
use (e.g., AVX downscaling)
- upscaling of the CPU if possible (aka "turbo") within thermal envelope
- various scalings (up/down/whatever) of the CPU speed to maximize
performance/watt (especially prevalent on mobile, but making inroads on
bigger chips too)
As a simple example, if I plug my laptop in, my benchmark scores more than
double. If I change my power plan options, the effect increases or reduces
depending on what I select.
When I reduce most of the variability by running at "max frequency" and
disabling most OS power management, I still see a turbo-boost related
effect where the first iterations of a benchmark run at a high (turbo)
speed that later ones (which throttle down) - which also depends on factors
like the ambient air temperature, and the type of pants I am wearing
(believe it or not - when I wear fleece pants, it causes higher temps and
turbo throttles down much more quickly).
There is a near-panacea for all this: have an option to use a measurement
which directly measures CPU cycles, rather than time. For example,
something like "unhalted cpu cycles" offered by the performance counters on
x86 chips. This solves most of the problems above since this counter
"ticks" at the same speed as the CPU. Some variability remains because some
parts of the system (notably, the latency to RAM) may not scale in the same
way, but in general it is much better.
I don't know practical this is to do from JMH, so for now I'm just throwing
the idea out there.
Travis
More information about the jmh-dev
mailing list