MethodHandle performance

Sat Jan 14 12:56:58 UTC 2017

As an experiment I have reimplemented MethodHandle::invokeWithArguments, 
so it only generates a spreader on the first invocation, after that the 
spreader will be reused. Now it is 10 times faster, therefore it reaches 
the performance of reflection. If we don't pass primitive arguments, the 
performance is close to MethodHandle::invoke.

https://gist.github.com/hoat4/b459938cf7ae93e64bba3208c69af567

On the first invocation of iWA, the new code checks if the MH is a 
fixed-arity MH, or a varargs collector. In case of a fixed-arity MH, 
this is simple, it stores the spreadInvoker in a field to be called by 
iWA. But if the MH is a varargs-collector, it creates a new object for 
caching the spreaders by the arguments count, and the iWA calls will be 
forwarded to this object.

To enable inlining of a constant MH's iWA, the spreader is stored in a 
final field. The field's initial value is an MH pointing to a setup 
method, and when it is called, it generates the spreader, and rewrites 
the final field with the generated spreader. This is risky, but I 
couldn't induce the JVM to inline the wrong spreader method. I haven't 
considered concurrency problems.

I've ran Michael Rasmussen's benchmark. This is the original JDK 8 
MethodHandle:

Benchmark                              Mode  Cnt    Score Error  Units

MyBenchmark.invoke                     avgt    5   25,611 ± 0,256  ns/op
MyBenchmark.invokeExact                avgt    5   25,658 ± 0,116  ns/op
MyBenchmark.invokeWithArguments        avgt    5  397,023 ± 39,137  ns/op
MyBenchmark.reflective                 avgt    5   42,578 ± 4,206  ns/op
MyBenchmark.staticInvoke               avgt    5   18,863 ± 0,417  ns/op
MyBenchmark.staticInvokeExact          avgt    5   18,918 ± 0,461  ns/op
MyBenchmark.staticInvokeWithArguments  avgt    5  390,777 ± 41,888  ns/op

And this is the new code's performance:

Benchmark                              Mode  Cnt   Score Error  Units
MyBenchmark.invoke                     avgt    5  25,623 ± 0,249 ns/op
MyBenchmark.invokeExact                avgt    5  25,623 ± 0,390 ns/op
MyBenchmark.invokeWithArguments        avgt    5  44,167 ± 0,774 ns/op
MyBenchmark.reflective                 avgt    5  42,549 ± 4,202 ns/op
MyBenchmark.staticInvoke               avgt    5  19,025 ± 0,417 ns/op
MyBenchmark.staticInvokeExact          avgt    5  18,910 ± 0,304 ns/op
MyBenchmark.staticInvokeWithArguments  avgt    5  32,013 ± 2,749 ns/op

  Attila

2017-01-13 20:04 keltezéssel, John Rose írta:
> On Jan 12, 2017, at 12:29 PM, Claes Redestad <claes.redestad at oracle.com> wrote:
>> Right, I was just looking at the micro Stephen provided me, and it does
>> seem that the added cost for this case is due to invokeWithArguments
>> creating a new invoker every time.
> This is a good workaround, and Stephen's report is a helpful reminder
> that our performance story has a sharp edge.
>
> We cache spreaders in the case of varargs methods,
> for full performance, but not for the ad hoc spreader used by MH.iWA.
>
> We should cache them, to remove this sharp edge (or performance pothole).
> There are small technical challenges to do so.  Claes and I added
> some notes to the bug report; maybe someone can look into it more.
>
> — John