Tailcalls in the jvm

Tue Feb 12 01:44:09 PST 2008

On Feb 10, 2008, at 5:17 AM, Arnold Schwaighofer wrote:

> Yes i am still working on tail calls. Sadly i have nothing to show  
> so far, as i have been put off my schedule during winter holidays  
> by former work obligations.

These things happen.  I think we'll make the Da Vinci Machine happen  
also!

> While looking at how to handle the security (stack inspection)  
> problem that tail calls impose indeed a question arose.
>
> For my first implementation i plan to issue a check when performing  
> the tail call whether the 'tailcalling' caller has the permission  
> java.security.allpermisson to ensure that the missing frame  
> (protection domain) of the caller does not change security  
> behavior. If the caller has any other permission no tail call  
> optimization will be performed.
>
> But inspired by the idea of storing the security information in the  
> continuation [1], I have come up with the following idea to  
> preserve the existing security semantics and always perform tail  
> call optimization (except for cases where it's obviously not  
> possible - exceptions, monitors etc). The idea is based on the  
> assumption that the protection domain of a class is a property that  
> does not change (often) e.g only when the class is loaded. This is  
> my question: is this assumption correct?

Yes.  It is a static property of the class.  The only thing dynamic  
about it is that a virtual or interface call can reach a computed and  
variable target method, and hence a computed and variable PD (which  
is derived from the target method's class).

> I have found the native method setProtectionDomain0() on  
> java.lang.class that is documented to be called by  
> ClassLoader.defineClass. My current understanding is that the only  
> way to change the protection domain is via reloading the class?
>
> << ==== only read the following if you have time ...
(long discussion of stack frame formats read but deleted)

> [1]A Tail-Recursive Machine with Stack Inspection, Clements, Felleisen

I have a few points of response:

0. I find that paper is obscure, since I don't often move in circles
where the term "contractum" pops up in normal conversation.
But there are circles within circles, I see, since Clements & Felleisen
fall back on named lambdas instead of the mysterium tremendum
of the Y combinator favored by Fournet & Gordon.  :-)

More comments on their paper below.  (I almost said, "inter alia.")

1. If protection domains are few, then we could, as you suggest,
put a single-word bitmap in every stack frame.

The idea to deoptimize and reformat on the 33rd, 65th, etc.
protection domains is clever.  I suppose if you were to
reserve a word in every stack frame, you could also
make it an indirection to an expandable array.

But I'd rather not go down this road.

In HotSpot we try to keep optimized stack frames as "native" as  
possible.
If there is a need, say, to track GC roots, we try not to compromise  
code quality
or stack frame layout but instead add metadata on the side.
This is a powerful technique.  It doesn't work for PD accumulation,
since that is a (slightly) dynamic property of stack frames.
I'd hate to invent stack frame customization machinery (complex,
and with unknown overheads), just for this one use case.

2. I agree protection domains are usually very few.
I haven't seen use of many.  (Maybe heavy users of signed JARs.)
In practice, they are proxies for ClassLoader identity, and you can
usually interchange ClassLoader with PD (or with the ordered pair)
when reasoning about privileges on stack frames.

I think the rules for PD-violating tail-calls can be simple and
conservatively approximated by similar rules about calls
between methods in different class loaders.

(Privileges are subtractive as stack frame accumulate, of course,
except that they are reset by doPrivileged.  So to be as permissive
as possible, you need a per-thread set of PDs, or you need to
represent them via active stack frames, and avoid deleting
frames which would change the subtractive answer.  Deleting
a subtraction causes an addition, which is the hazard here,
since the addition is of a set of privileges which would normally
be denied to the PD.  I find this confusing, BTW.)

3. Since we're defining our own VM here, let's see what we can do
by refusing to handle the hard cases.  One too-radical idea:  Throw
a linkage error if less-privileged code tail-calls more privileged code.

This probably won't work well in the presence of method handles
(or even virtual/interface calls) since the byte compiler cannot
statically determine the identity of a callee and refrain from
requesting a bad tail call.  A scheme example would be:
   (define (revlist listfn a b) (listfn b a))

Suppose revlist is in an applet, and listfn is bound to
the actual a system routine 'list'.

Then the draconian rule will cause the code to fail.
This rule is well-motivated, since the listfn might also
be bound to for-each, which could cause a closure
b to be invoked in an privilege context elevated
beyond revlist's capabilities.

4. OK, that was a bad start.  Let's try again.

Make a less draconian rule, one that allows
the call but keeps more of the stack trace around.
When revlist tail-calls list, the system can keep
the stack frame around (or keep an adapter frame)
to represent the PD of revlist.

I call this a "defective tail call", because the compiler
requested a tail call, but the runtime pushed a stack
anyway.  The application cannot detect this, unless
it performs an unbounded recursion using only tail calls
some of which are defective; eventually it will get
a stack overflow from the frames pushed by the
defective calls.

Apparently Fournet and Gordon (whom I have not
read) allow their engine to defeat tailcalls, because
C&F say they make the frame take-down operation
explicit.

Basically, I propose the JVM should defeat a tail
call when it detects a cross-PD call.  It should
refrain from taking down the frame, or (better)
replace the frame with a holder for the PD.

I think this is a workable rule, though I haven't thought
hard about what bad consequences it might bring.
The bad cases I can think of seem implausible.

If you allow cross-PD tail-calls to be defective,
you burden only code patterns which perform looping
or "threaded interpreter" (non-pushing) state transitions
between mutually untrusting modules.  This seems
unlikely; in Scheme at least nearly all tail calls are
either within the same compilation unit, or else are
"accidental" to a call to a routine which either
returns immediately (e.g., list or car) or else
performs something which inherently needs
a stack frame (e.g., for-each, dynamic-wind).

C&F claim that these infinitely looping tail calls
pingponging between system domains would
be commonly written, if only Java programmers
could be assured that the stack would not blow
up.

Maybe there is a PyPy JIT example, or some sort
of message passing state machine, for which the
stack frame buildup would be unbounded.
But I don't see one yet, and as an engineer
I don't want to work on problems that nobody
seems to care about.  Most control-passing
patterns involve at least a little branchiness,
where a caller tries first one thing and then
another; that caller needs to push a stack
frame.

I guess supporting pure CPS (i.e., heap-resident
user-built frames) might run into problems,
but (again) when I try to come up with Scheme
examples I find system calls which are either
leaves (car) or inherently pushy (for-each).

Many Scheme implementations allow cross-module
tail calls to be defeated.  They have to apologize for
this, but the defect does not seem to cause problems
for programmers.

So, in the absence of more evidence, I think
Schinz and Odersky (referenced in C&F)
strike the right compromise.

5. Or we could use little adapter frames to represent
PD information.

This is the engineer's equivalent to C&F's  continuation
markers, which summarize annotations derived from
backtraces.

Have a cross-PD tail call push an adapter (not the original
frame) which the JVM creates specially for the purpose
of temporarily defeating (potentially) cross-PD tail calls.

But, keep a counter in them, and when too many pile up,
crush them down into a single adapter, with a composite
PD marker.  The "crush" operation can compress an
unlimited number of frames, since the PD set the JVM
needs can be unordered, and PDs are few.

(Oddly enough, I wouldn't need to produce an asymptotic space
proof over a multi-kinded combinatorial algebra in order to
code this up.  It's a funny world.)

Shall we do the simple "defeatist" thing first, and reserve
the adapter frames for a second effort?

Best wishes,
-- John