thinking about proper implementation of tail calls

Jochen Theodorou blackdrag at gmx.org
Fri Aug 31 04:27:48 PDT 2012


Am 31.08.2012 02:39, schrieb John Rose:
[...]
> Suppose you are compiling your favorite high-level language to the
> JVM, and you start running into the various size limits in class
> files.  (Usually they come from the 16-bit indexes in various
> places.)  What do you do, dear?  (H/T to Maurice Sendak,
> http://www.amazon.com/What-Do-You-Dear/dp/0064431134 .)

you mean like for example the number of methods ;) (We ran into this in 
the past). Of course you mean the size of a method.

[...]
> It would be great if there were an ASM module which could
> automatically render an over-sized "mega-method" into some
> JVM-compliant form.  What would this take?

As Charles already mentioned, there is currently something in work on 
the asm list... maybe you want to contribute?

> A number of things:
>
> 1. The proposed mega-method would have to be broken into a control
> flow graph of basic blocks.
>
> 2. The basic blocks would have to be packed into one or more
> automatically-generated normal "mini-methods" (of JVM-compliant
> sizes).
>
> 3. Control transitions between basic blocks would have to be
> re-coded, if they cross from one mini-method to another. 3a. These
> transitions would not be allowed to blow the stack. 3b. The live data
> in local variables (as of the given transition) would have to be
> transmitted somehow.
>
> 4. Something clever would have to be done to make the runtime stack
> walker see the originally named mega-method.
>
> Step 3b is non-trivial, and probably requires boxing on the heap, in
> general.  (Remember that you can have up to 2^16-1 locals, but only
> 2^8-1 parameters.)  In the case where there are only a few locals,
> they can just be passed in registers as parameters.
>
> Step 3a would seem to require tail calls.  Specifically, the act of
> leaving one mini-method M1 to go to a block in another mini-method M2
> requires a hard tail call from M1 to M2, passing any live data values
> that the M2 block requires.

You can handle part of the problem with normal calls, possibly exceeding 
the stack depth limit. Whatever code goes in there needs to maximally 
return 1 value. As long as that constraint is met, you can split the 
method. of course if your code more or less has the from (assignment, 
method call)*, then it is not helping.

I wonder how such a split with hard tail calls would handle the 
exception table.

> The corresponding act of entering a given target block in M2 also
> requires some thought.  Unless you are lucky or clever, or unless you
> render one basic block to one mini-method, M2 will (logically) have
> multiple entry points, one for each basic block that is a successor
> of a basic block in some other mini-methd M1.  Representing such
> multiple entry points is tricky.  Here are three ways to do it:
>
> A. Add an integer parameter (basic block selector index) to M2, and
> structure M2 as a Big Switch.  (This is a common technique; Rhino
> does this for example.)  Then M1 passes the selector as an extra
> parameter.
>
> B. Add multiple entry points to the class file format.  In
> particular, allow M2 to declare multiple entry points, perhaps with
> an entry point instruction which looks like a Big Switch.  Compile
> the rest of M2 as in A.  Link calls into M2 using compound symbolic
> references such as "M2.42", where 42 is the switch selector.  (Note
> that dot '.' is illegal in method names.)
>
> C. Structure M2 as in A, but run the call from M1 through
> invokedynamic, without the extra parameter.  When linking the call
> from M1, bind the extra parameter into the method handle for M2.  Ask
> your friendly neighborhood JVM implementor to optimize this method
> handle, at link time, into a form which is equivalent to the extra
> entry points in B.

First, this all reminds me of continuations. Second, the motivation for 
not doing basic-block-small-methods is the limit on the number of 
methods in a class imho. Third I see two different ways of splitting, 
the continuation style, where you kind of cut off a method by putting 
the cut part into a new method. Since the cut off part may contain 
several entry points, you need the above, but they are known at compile 
time and there is no need for anything beyond (B). Of course if you want 
to go further to continuations, then A/C become more interesting. But I 
see it currently out of scope for tail calls.

> P.S.  The distinction between "soft" and "hard" tail calls is
> described here:
> https://blogs.oracle.com/jrose/entry/tail_calls_in_the_vm
>
> I don't care much about soft tail calls (as an elective optimization)
> as much as hard tail calls (proper implementation, which makes a
> contract not to blow the application stack under defined
> conditions).

I agree

bye Jochen

-- 
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org



More information about the mlvm-dev mailing list