Understanding the code-generation strategy
Matt Warren
matt.warren at live.co.uk
Mon Dec 22 11:21:55 UTC 2014
I'm thinking about writing a .NET version of JMH or at the very least using
several of the (applicable) ideas in JMH to write a .NET benchmarking
framework.
Fortunately (or unfortunately) depending on how you look at it, in .NET
things are a bit simpler, i.e. one-time JITting, no HotSpot, etc
One thing that I've not been able to understand is the technical reasoning
behind the code-generation strategy used in JMH. I.e. emitting java code
that is then compiled into a single .jar, versus calling the compiled
benchmark code directly via reflection?
I've read the canonical "JMH vs Caliper: reference thread"[1] and
in-particular the section:
-------------------------------------------------------------------------
A. Dynamic selection of benchmarks.
Since you don't know at "harness" compile time what benchmarks it would
run, the obvious choice would be calling the benchmark methods via
Reflection. Back in the days, this pushed us to accept the same
"repetition" counter in the method to amortize the reflective costs. This
already introduces the major pitfall about looping, see below.
But infrastructure-wise, harness then should intelligently choose the
repetition count. This almost always leads to calibrating mechanics, which
is almost always broken when loop optimizations are in effect. If one
benchmark is "slower" and autobalances with lower reps count, and another
benchmark is "faster" and autobalances with higher reps count, then
optimizer have more opportunity to optimize "faster" benchmark even
further. Which departs us from seeing how exactly the benchmark performs
and introduces another (hidden! and uncontrollable!) degree of freedom.
In retrospect, the early days decision of JMH to generate synthetic
benchmark code around the method, which contains the loop (carefully chosen
by us to avoid the optimizations in current VMs -- separation on concerns,
basically), is paying off *very* nicely. We can then call that synthetic
stub via Reflection without even bothering about the costs.
...That is not to mention users can actually review the generated benchmark
code looking for the explanations for the weird effects. We do that
frequently as the additional control.
-------------------------------------------------------------------------
My question is, what are the *technical* reasons for doing code generation
this way? I understand the nice benefits you get (can inspect the generated
code), but I see those as side-benefits.
Is it purely related to the overhead of call a method via a reflection
call? I wrote a simple benchmark[2] (using JMH of course!) and assuming I
didn't mess it up, got the following numbers[3]:
# Run complete. Total time: 00:06:47
Benchmark Mode Samples Score
Error Units
o.s.MyBenchmark.baseline avgt 50 0.283
± 0.004 ns/op
o.s.MyBenchmark.reflectionMethodCall avgt 50 80.220
± 1.082 ns/op
o.s.MyBenchmark.reflectionMethodCallNoCasting avgt 50 79.908
± 1.266 ns/op
o.s.MyBenchmark.regularMethodCall avgt 50 77.749
± 1.342 ns/op
Is it this difference alone that makes the Reflection call per invocation
not possible?
I'm wondering if I'd need to use this technique in a .NET benchmark harness
as .NET gives you the ability to compile a method call (obtained via
reflection) into a regular delegate, so you don't pay the same penalty when
calling it, see [4] and [5].
Or have I missed the point entirely?
Cheers
Matt
[1]
https://groups.google.com/d/msg/mechanical-sympathy/m4opvy4xq3U/7lY8x8SvHgwJ
[2] https://gist.github.com/mattwarren/75c969ce58e72883a9a0 (Benchmark)
[3] https://gist.github.com/mattwarren/15d88da37c57a222dfd7 (Full Results)
[4] http://msdn.microsoft.com/en-us/library/ms228976(v=vs.110).aspx
[5]
http://msdn.microsoft.com/en-us/library/system.delegate.createdelegate(v=vs.110).aspx
More information about the jmh-dev
mailing list