Understanding the code-generation strategy

Mon Dec 22 11:21:55 UTC 2014

I'm thinking about writing a .NET version of JMH or at the very least using
several of the (applicable) ideas in JMH to write a .NET benchmarking
framework.
Fortunately (or unfortunately) depending on how you look at it, in .NET
things are a bit simpler, i.e. one-time JITting, no HotSpot, etc

One thing that I've not been able to understand is the technical reasoning
behind the code-generation strategy used in JMH. I.e. emitting java code
that is then compiled into a single .jar, versus calling the compiled
benchmark code directly via reflection?

I've read the canonical "JMH vs Caliper: reference thread"[1] and
in-particular the section:

-------------------------------------------------------------------------
A. Dynamic selection of benchmarks.

Since you don't know at "harness" compile time what benchmarks it would
run, the obvious choice would be calling the benchmark methods via
Reflection. Back in the days, this pushed us to accept the same
"repetition" counter in the method to amortize the reflective costs. This
already introduces the major pitfall about looping, see below.

But infrastructure-wise, harness then should intelligently choose the
repetition count. This almost always leads to calibrating mechanics, which
is almost always broken when loop optimizations are in effect. If one
benchmark is "slower" and autobalances with lower reps count, and another
benchmark is "faster" and autobalances with higher reps count, then
optimizer have more opportunity to optimize "faster" benchmark even
further. Which departs us from seeing how exactly the benchmark performs
and introduces another (hidden! and uncontrollable!) degree of freedom.

In retrospect, the early days decision of JMH to generate synthetic
benchmark code around the method, which contains the loop (carefully chosen
by us to avoid the optimizations in current VMs -- separation on concerns,
basically), is paying off *very* nicely. We can then call that synthetic
stub via Reflection without even bothering about the costs.

...That is not to mention users can actually review the generated benchmark
code looking for the explanations for the weird effects. We do that
frequently as the additional control.
-------------------------------------------------------------------------

My question is, what are the *technical* reasons for doing code generation
this way? I understand the nice benefits you get (can inspect the generated
code), but I see those as side-benefits.

Is it purely related to the overhead of call a method via a reflection
call? I wrote a simple benchmark[2] (using JMH of course!) and assuming I
didn't mess it up, got the following numbers[3]:

# Run complete. Total time: 00:06:47

Benchmark                                        Mode  Samples   Score
  Error  Units
o.s.MyBenchmark.baseline                         avgt       50   0.283
± 0.004  ns/op
o.s.MyBenchmark.reflectionMethodCall             avgt       50  80.220
± 1.082  ns/op
o.s.MyBenchmark.reflectionMethodCallNoCasting    avgt       50  79.908
± 1.266  ns/op
o.s.MyBenchmark.regularMethodCall                avgt       50  77.749
± 1.342  ns/op

Is it this difference alone that makes the Reflection call per invocation
not possible?

I'm wondering if I'd need to use this technique in a .NET benchmark harness
as .NET gives you the ability to compile a method call (obtained via
reflection) into a regular delegate, so you don't pay the same penalty when
calling it, see [4] and [5].

Or have I missed the point entirely?

Cheers

Matt

[1]
https://groups.google.com/d/msg/mechanical-sympathy/m4opvy4xq3U/7lY8x8SvHgwJ
[2] https://gist.github.com/mattwarren/75c969ce58e72883a9a0 (Benchmark)
[3] https://gist.github.com/mattwarren/15d88da37c57a222dfd7 (Full Results)
[4] http://msdn.microsoft.com/en-us/library/ms228976(v=vs.110).aspx
[5]
http://msdn.microsoft.com/en-us/library/system.delegate.createdelegate(v=vs.110).aspx