The Great Startup Problem

Mon Sep 1 16:16:27 UTC 2014

Am 01.09.2014 15:24, schrieb Vladimir Ivanov:
[...]
> "N frames per chain of N method handles" looks reasonable for me, but it
> depends on average number of transformations users apply. If the case of
> deep method handle chains is common in practice, we need to optimize for
> it as well and linear dependency in stack space may be too much.

Well, currently I have at least one guard per method call argument and 
receiver. If you count dropping arguments, type transformation, the 
guard part itself, you get only for the guard itself 3 frames. Counting 
up to 5 arguments + receiver, that is again 17 frames in the naive 
approach. And we are talking only about the guards.

I assume, the problem would be a magnitude smaller if the JVM could do 
tail calls. But I wonder if it is not possible to make the execution of 
the forms less recursive and not have some lambda forms cover more than 
a single handle.

For example... if you have a series of guards, wouldn't it be possible 
to execute them in manner of this:

def myHandleForm(...) {
   ...
   // execute guards
   while (currentGuardFrom!=null) {
     if (executeCurrentGuardFormFail(...)) {
       return executeCurrentGuardFormFalsePath(...)
     }
     currentGuardFrom = getNextCurrentGuardForm(...)
   }
   executeNonGuardFormRemainder(....)
}

where a guard form is the result of a merge of type transformation, 
argument insertion, drop and the actual handle for the guard method.

I am positive that could be written in a very generic way. In general I 
think that a certain series of handles could be merged. But of course I 
don't know about how much JIT likes such things.

[...]
>> That makes 5 frames in between. 5 is worlds better than 53.
> Ok, 5 additional frames for simple case. Is such overhead tolerable for
> you? Or do you need smaller number of intermediate frames?

ah... you know, when it comes to such things language implementors are 
quite greedy ;)

> What are your estimate for complex case? What's the worst case in Groovy?

I think the worst cases are not so much to worry about. What would be 
good, is if the first visit would be as small as possible. That is in my 
case the generic handle installed by the bootstrap method to do the 
runtime type base method selection. That's currently something around 25 
frames I think. In a big application you will get a huge amount of 
callsites that are visited only once. So having here a small overhead 
only will safe later on.

For a few days I am wondering about a special kind of logic to help with 
memory consumption and maybe you can tell me if that can work out. What 
I am thinking of is using WeakReference to reference my actual method 
execution path, a guard that checks if that handle is still available 
and if not it executes a failback. The idea being, that if memory 
becomes a concern, all the one-time visited callsite, that are not part 
of the current trace, can be reduced to just do method selection again. 
Could that work out? Will inlining still be possible?

>>> We discussed an idea to generate custom bytecodes (single method) for
>>> the whole method handle chain (and have only 1 extra stack frame per MH
>>> invocation), but it defeats memory footprint reduction we are trying to
>>> archieve with LambdaForm sharing.
>>
>> I wonder if that is the case for Groovy as well. Our old callsite
>> mechanism does have only 1 frame (upon second execution). Because by
>> then we generated a class for the callsite that does all the argument
>> transformation, checks and target method execution. So compared to that
>> I would not expect a memory increase.
> We are looking for ways to significantly reduce memory consumption of
> JSR292 implementation. Inlining of LFs from call site means 1 anonymous
> class per indy call site. Comparing to fully customized LambdaForms, it
> should give noticeable savings due to smaller number of anonymous
> classes being loaded. But it doesn't comply with ultimate goal of fixed
> set of combinators used to implement all possible behaviors.

since in the traditional implementation the callsite is always Object[] 
based we have one such class per executed target method. Of course we 
run into profile pollution if we use the same callsite object for 
multiple callsites, but it would be the same for the target method, so 
in my thinking there is no real problem. Anyway... if there is no need 
to create such a class per target of a direct method handle, then I 
would expect quite a lot of less memory usage from your approach

bye Jochen

-- 
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org