thinking about proper implementation of tail calls

Sat Sep 1 06:49:58 PDT 2012

On 09/01/2012 01:26 PM, Florian Weimer wrote:
> * John Rose:
>
>> As I recall, Doug Lea noted at the 2010 JVM Language Summit that
>> tail calls would seem to allow work-stealing algorithms to be
>> implemented somewhat more cleanly or efficiently.  (How's that for
>> tentative?)  A worker thread goes from task to task in a data-driven
>> way, similar to the examples given (e.g., in Guy's article) of
>> linked list traversal.  Currently you need an outer loop to "pump"
>> the task sequence, and each task must return to the loop to allow
>> the next task to run.  If you had tail calls, you would not need to
>> force the task sequence into a form that the outer loop can
>> understand, which would allow more flexibility in organizing the
>> task sequence.  More freedom in shaping the code can translate into
>> better-tuned algorithms.
> I would be surprised if tail calls give more flexibility.  Usually
> it's the opposite, the mini-interpreter wins in terms of flexibility:
>
>    <http://www.enyo.de/fw/notes/tail-calls.html>

This trick can work with a lexer because usually you don't keep states
between each recognized tokens, it's harder to use the same trick with
a parser or an interpreter.

> The mini-interpreter turns direct calls into difficult-to-predict
> indirect calls.  This could be a drawback.  But optimizing that is
> probably not much more difficult than a general implementation of hard
> tail calls.

no, I don't think so because as you said direct calls are not hard to 
predict.

>
>> I have been looking for a "killer application" for tail calls on the
>> JVM.  Doug's example may turn out to one.  Here's another candidate:
>> Overcoming class file size limits.
> I doubt it.
>
>> Step 3b is non-trivial, and probably requires boxing on the heap, in
>> general.  (Remember that you can have up to 2^16-1 locals, but only
>> 2^8-1 parameters.)  In the case where there are only a few locals,
>> they can just be passed in registers as parameters.
> It also needs multiple return values or unboxed tuples, so that
> updated state can be carried around without boxing.
>
>> Step 3a would seem to require tail calls.  Specifically, the act of
>> leaving one mini-method M1 to go to a block in another mini-method
>> M2 requires a hard tail call from M1 to M2, passing any live data
>> values that the M2 block requires.
> Only if you can have closures which share mutable variables (sometimes
> called "upvalues").  Otherwise, you still have marshal state updates
> to the canonical locals, using a mini-interpreter, and the individual
> state machine entries would have different function signatures.
>
> Or you have to go full CPS and have an endless parameter list, but
> that's going to run into the parameter count limit.  Copying this much
> state around isn't going to be free, either.  And each entry in the
> state machine would have to explicitly thread its full state, so it's
> most useful for automatically generated code.

Rémi