The Great Startup Problem

Tue Sep 2 17:54:48 UTC 2014

Charlie,

>> Is it acceptable and solves the problem for you?
>
> This is acceptable for JRuby. Our worst-case Ruby method handle chain
> will include at most:
>
> * Two CatchExceptions for pre/post logic (heap frames, etc). Perf of
> CatchException compared to literal Java try/catch is important here.
> * Up to two permute arguments for differing call site/target argument ordering.
> * Varargs negotiation (may be a couple handles)
> * GWT
> * SwitchPoint
> * For Ruby to Java calls, each argument plus the return value must be
> filtered to convert to/from Ruby types or apply an IRubyObject wrapper
>
> This is worst case, mind you. Most calls in the system will be
> arity-matched, eliminating the permutes. Most calls will be three or
> fewer arguments, eliminating varargs. Many calls will be optimized to
> no longer need a heap frame, eliminating the try/finally. The absolute
> minimum for any call would be SwitchPoint plus GWT.
>
> Of course I'm not counting DMHs here, since they're either the call we
> want to make or they're leaf logic.
Thanks for the data! That's good!

>> We discussed an idea to generate custom bytecodes (single method) for the
>> whole method handle chain (and have only 1 extra stack frame per MH
>> invocation), but it defeats memory footprint reduction we are trying to
>> archieve with LambdaForm sharing.
>
> Funny thing...because indy slows our startup and increases our warmup
> time, we're using our old binding logic by default. And surprise
> surprise, our old binding logic does exactly this...one small
> generated invoker class per method. I'm sure you're right that this
> approach defeats the sharing and memory reduction we'd like to see
> from LFs, but it works *really* well if you're ok with the extra class
> and metaspace data in memory.
I see one problem with pre-compiling method handle trees.
Every tree should be compiled as a whole, so fast path and slow path are 
always compiled. Without explicit hints or profiling and recompilation 
it's impossible to distinguish them.

Comparing with MethodHandle/LambdaForm compilation unit, where slow path 
usually stays interpreted on LF level (due to invocation threshold), for 
considerably large method handle trees memory overhead can be larger.

But I'm just guessing here - I don't have any statistics yet neither on 
average size of method handle trees nor numbers on memory overhead 
induced by individual classes.

> So there's one question: is the cost of a bytecoded adapter shim for
> each method object really that high? Yes, if you're spinning new MHs
> constantly or doing a million different adaptations of a given method.
> But if you're just lazily creating an invoker shim once per method,
> that really doesn't seem like a big deal.
Good question. I have a prototype of LF inlining during bytecode 
translation. I'll conduct some experiments to gather some data.

> My indy binding logic also has a dozen different flags for tweaking. I
> can easily modify it to avoid doing all that pre/post logic and
> argument permutation in the MH chain and just bind directly to the
> generated invoker. Best (or worst) of both worlds? I just really don't
> want to have to do that...I want everything from call site to target
> method body to be in the MH chain.
>
> For JRuby 9000, all try/finally logic will be within the target
> method, so at least that part of the MH chain goes away.
>
> Here's another idea...
>
> We've been using my InvokeBinder library heavily in JRuby. It provides
> a Java API/DSL for creating MH chains lazily from the top down:
>
> MethodHandle mh = Binder.from(String.class, Object.class, Float.class)
>          .tryFinally(finallyLogic)
>          .permute(1, 0)
>          .append("Hello")
>          .drop(1)
>          .invokeStatic(MyClass.class, "someMethod");
>
> The adaptations are gathered within the Binder instance, playing
> forward as you add adaptations and played backward at binding time to
> make the appropriate MethodHandles and MethodHandle calls.
>
> Duncan talked about how he was able to improve MH chain size and
> performance by applying certain transformations in a different order,
> among other things. InvokeBinder *could* be doing a lot more to
> optimize the MH chain. For example, the above case never uses the
> Object value passed in (it is permuted to position 1 and later
> dropped), but that fact is obscured by the intervening append.
>
> InvokeBinder is basically doing with MHs what MHs do with LFs. Perhaps
> what we really need is a more holistic view of MH + LF operations
> *together* so we can boil the whole thing down (even across MH lines)
> before we start interpreting or compiling it?
The idea of rearranging method handles looks interesting. If JSR292 
framework treated some method handle chains specifically (like having 
custom LambdaForm shape for nested guards), it would be beneficial to 
favor such shapes in the binder.

Best regards,
Vladimir Ivanov

>
> - Charlie
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>