Understanding the code-generation strategy

Mon Dec 22 15:59:18 UTC 2014

On 12/22/2014 02:21 PM, Matt Warren wrote:
> Is it purely related to the overhead of call a method via a reflection
> call? I wrote a simple benchmark[2] (using JMH of course!) and assuming I
> didn't mess it up, got the following numbers[3]:
> 
> # Run complete. Total time: 00:06:47
> 
> Benchmark                                        Mode  Samples   Score
>   Error  Units
> o.s.MyBenchmark.baseline                         avgt       50   0.283
> ± 0.004  ns/op
> o.s.MyBenchmark.reflectionMethodCall             avgt       50  80.220
> ± 1.082  ns/op
> o.s.MyBenchmark.reflectionMethodCallNoCasting    avgt       50  79.908
> ± 1.266  ns/op
> o.s.MyBenchmark.regularMethodCall                avgt       50  77.749
> ± 1.342  ns/op
> 
> 
> Is it this difference alone that makes the Reflection call per invocation
> not possible?

For Java/JMH, there are three considerations:

 a) The cost of the reflective call itself. This is why people would try
to "optimize" benchmark payload calls by introducing the repetition
count, which breaks horribly in most of the cases.

 b) The boxing rules. Calling method that returns "int" would autobox
the value into Integer, much to surprise the benchmark users.

 c) The compiler interaction. Hot reflective calls are optimized, and
the performance characteristics of optimized reflective call drastically
differs from that of the non-optimized call.

All these considerations introduce the unwanted effects into the benchmarks.

> I'm wondering if I'd need to use this technique in a .NET benchmark harness
> as .NET gives you the ability to compile a method call (obtained via
> reflection) into a regular delegate, so you don't pay the same penalty when
> calling it, see [4] and [5].

Well, and JVM also routinely inflates the hot reflective calls into
bytecode, and then optimizes them as the regular methods. But there is
an inherent problem with this argument, which I know under the name of
"magic compiler card".

> My question is, what are the *technical* reasons for doing code generation
> this way? I understand the nice benefits you get (can inspect the generated
> code), but I see those as side-benefits.

You can only play the magic compiler card sporadically. The best place
to invoke it is when you argue against micro-optimizing all the code in
the project. But when you are dealing with a very sensitive performance
code, you have to assume the compiler is dumb. This puts you into the
role of the compiler yourself.

When you are building the benchmarking harness that, among other things,
is here to quantify the compiler effects, you have to minimize the
effects of the changing "compiler environment" on your code. This leads
to actually generating the low-level code that has a better chance of
being translated 1:1 to machine code by different compilers running
under different conditions.

Generating the synthetic code in JMH allows us to pre-optimize the code,
e.g.:

 * omit the @Setup/@TearDown calls where they are not needed (under
assumption the compiler is dumb, and can't inline empty methods or
eliminate empty loops)

 * put the compiler hints over the methods we know are crucial for a
sane environment and/or *generate* the parts of the infrastructure code
right in the hot methods (under the assumption the compiler is dumb, and
can't reliably figure out what to inline for performance)

 * violate best OOP practices and access the important fields directly
(under the assumption the compiler is dumb, and can't devirtualize
and/or inline the getters)

 * cache every important piece of data in locals (under the assumption
the compiler is dumb, and can't figure out this by itself -- especially
when the memory model issues are involved)

 * do primitive specializations (under the assumption the compiler is
dumb, and can not optimize it reliably)

 * generate padding for our internal structures to avoid false sharing,
(under the assumption the runtime is dumb, and can't avoid it itself)

Some of those optimizations are downright non-conservative, and they
only work in our isolated use cases. In other words, something that
compiler would almost never optimize for; or would try to do so
speculatively, which brings the "speculation failed"-checks and such.

------------------------------------------------------------------------

Bottom-line: if you want to have the best (= fast and predictable)
performance in most cases, you have to write the specialized code. When
you have to write lots of repetitive specialized code, it makes sense to
use code generators. That's what JMH does for its benchmark stubs.

Thanks,
-Aleksey.