MethodHandle.invoke* performance

Wed Apr 3 08:50:55 PDT 2013

On 04/03/2013 05:03 PM, Cédric Champeau wrote:
> Hi guys,

Hi Cedric,

>
> First of all, sorry if my question looks stupid, but I have 
> difficulties explaining what I see. I made a small benchmark for 
> various MethodHandle.invoke* combinations. Of course, micro-benchmarks 
> are evil, but in that case,

Sorry to be rude, but it's still a micro-benchmark ...

> I find the figures quite interesting. Note that I'm not using 
> invokedynamic instructions anywhere, I'm just "testing" the API from 
> plain Java. As MethodHandles are promoted as "faster than reflection", 
> I expected very different results.
>
> So first, the link: https://gist.github.com/melix/5291792
>
> The results I obtain here with openjdk-8-lambda b83 are not good:
>
> |Classic for loop[-1] 166ms||
> ||InvokeExact[-1] 170ms||
> ||InvokeExact (local method handle)[-1] 3410ms||
> ||bindTo+invokeExact[-1] 6905ms||
> ||insertArgument+invokeExact[-1] 6254ms||
> ||invoke[-1] 60118ms||
> ||bindTo+invoke[-1] 80072ms||
> ||insertArgument+invoke[-1] 78337ms||
> ||Reflect[-1] 1219ms||
> |
> I added the "static final field" version as a suggestion from Jochen 
> Theodorou, and it does make things much faster, but it's the *only* 
> case that comes close to regular Java performance. Any other 
> combination is much slower. I know that invoke() is supposed to 
> perform type conversions, but is it supposed to be that slow as 
> compared to invokeExact? What explains the difference between the 
> local method handle and the static field version? In theory, using 
> invokedynamic instruction, the method handle would come from a 
> boostrap method so I could expect the same performance as the static 
> field version, but if any other combination than direct invokeExact 
> calls are slower, then it's not good. So, basically, I'd like to know 
> what I'm missing here :)

for invokedynamic, it's not in theory, if your method handle is constant 
either because it's a static final or because it is nested in a 
CallSite, it's constant for the JIT, thus fully optimized.

for method handle on stack, the method handle is obviously not constant
moreover the JIT is not able use trick to make it constant (like 
hoisting it out of the loop,
or doing the inlining algorithm in a backward way etc.)
More on that later ...

As you said it's a micro-benchmark so you end up with unusual good 
performance,
by example the call to j.l.r.Method is optimized as never it will be in 
a real program
(you call the method in the same unit it was declared and
you have less than 3 instances of Method that are called more than 60 
times).

Now, Krystal is currently working to add a cache when a method handle is 
called,
so in few betas, the performance of method handles of your 
micro-benchmark will improve dramatically.
And the cache of MethodHandle is better that the cache which is used for 
j.l.r.Method because it can be local to a callsite and not global (in 
fact local to one callsite in the code of j.l.r.Method that is used by 
the invocation path when you call "invoke").

Anyway, because it's a micro-benchmark the result will be as useless as 
now to predict the behaviour of a real world program.

>
> Thanks for your explanations!
>
> P.S: Note that order of tests do not matter here, I know I should not 
> mix things, but I tested with different test execution order without 
> more success.
> -- 
> Cédric Champeau
> SpringSource - A Division Of VMware
> http://www.springsource.com/
> http://twitter.com/CedricChampeau

cheers,
Rémi