The Great Startup Problem

Tue Sep 2 14:38:47 UTC 2014

Jochen,

 >> "N frames per chain of N method handles" looks reasonable for me, but it
>> depends on average number of transformations users apply. If the case of
>> deep method handle chains is common in practice, we need to optimize for
>> it as well and linear dependency in stack space may be too much.
>
> Well, currently I have at least one guard per method call argument and
> receiver. If you count dropping arguments, type transformation, the
> guard part itself, you get only for the guard itself 3 frames. Counting
> up to 5 arguments + receiver, that is again 17 frames in the naive
> approach. And we are talking only about the guards.
>
> I assume, the problem would be a magnitude smaller if the JVM could do
> tail calls. But I wonder if it is not possible to make the execution of
> the forms less recursive and not have some lambda forms cover more than
> a single handle.
>
> For example... if you have a series of guards, wouldn't it be possible
> to execute them in manner of this:
>
> def myHandleForm(...) {
>    ...
>    // execute guards
>    while (currentGuardFrom!=null) {
>      if (executeCurrentGuardFormFail(...)) {
>        return executeCurrentGuardFormFalsePath(...)
>      }
>      currentGuardFrom = getNextCurrentGuardForm(...)
>    }
>    executeNonGuardFormRemainder(....)
> }
>
> where a guard form is the result of a merge of type transformation,
> argument insertion, drop and the actual handle for the guard method.
>
> I am positive that could be written in a very generic way. In general I
> think that a certain series of handles could be merged. But of course I
> don't know about how much JIT likes such things.
>
> [...]
It's possible to optimize some shapes of method handle chains (like 
nested GWTs) and tailor special LambdaForm shape or do some inlining 
during bytecode translation. Though such specialization contradicts LF 
sharing goal, probable benefits may worth the effort.

>>> That makes 5 frames in between. 5 is worlds better than 53.
>> Ok, 5 additional frames for simple case. Is such overhead tolerable for
>> you? Or do you need smaller number of intermediate frames?
>
> ah... you know, when it comes to such things language implementors are
> quite greedy ;)
... but I can fulfil only 3 wishes ;-)

>> What are your estimate for complex case? What's the worst case in Groovy?
>
> I think the worst cases are not so much to worry about. What would be
> good, is if the first visit would be as small as possible. That is in my
> case the generic handle installed by the bootstrap method to do the
> runtime type base method selection. That's currently something around 25
> frames I think. In a big application you will get a huge amount of
> callsites that are visited only once. So having here a small overhead
> only will safe later on.
>
> For a few days I am wondering about a special kind of logic to help with
> memory consumption and maybe you can tell me if that can work out. What
> I am thinking of is using WeakReference to reference my actual method
> execution path, a guard that checks if that handle is still available
> and if not it executes a failback. The idea being, that if memory
> becomes a concern, all the one-time visited callsite, that are not part
> of the current trace, can be reduced to just do method selection again.
> Could that work out? Will inlining still be possible?

I don't think it will work. If you load a MethodHandle from 
WeakReference and then use MH.invoke*, inlining will be broken for sure.

>>>> We discussed an idea to generate custom bytecodes (single method) for
>>>> the whole method handle chain (and have only 1 extra stack frame per MH
>>>> invocation), but it defeats memory footprint reduction we are trying to
>>>> archieve with LambdaForm sharing.
>>>
>>> I wonder if that is the case for Groovy as well. Our old callsite
>>> mechanism does have only 1 frame (upon second execution). Because by
>>> then we generated a class for the callsite that does all the argument
>>> transformation, checks and target method execution. So compared to that
>>> I would not expect a memory increase.
>> We are looking for ways to significantly reduce memory consumption of
>> JSR292 implementation. Inlining of LFs from call site means 1 anonymous
>> class per indy call site. Comparing to fully customized LambdaForms, it
>> should give noticeable savings due to smaller number of anonymous
>> classes being loaded. But it doesn't comply with ultimate goal of fixed
>> set of combinators used to implement all possible behaviors.
>
> since in the traditional implementation the callsite is always Object[]
> based we have one such class per executed target method. Of course we
> run into profile pollution if we use the same callsite object for
> multiple callsites, but it would be the same for the target method, so
> in my thinking there is no real problem. Anyway... if there is no need
> to create such a class per target of a direct method handle, then I
> would expect quite a lot of less memory usage from your approach
That's interesting. I'll try to experiment with that. Thanks for sharing 
your experience.

Best regards,
Vladimir Ivanov

>
>
> bye Jochen
>