Static analysis of non-local transfers: a proposal

Thu Nov 20 09:27:53 PST 2008

Dear Java closures people,

I very much appreciate your interesting and enterprising work on Javac.
As a theoretically minded researcher in programming languages, I have a
suggestion to offer regarding a possible scheme for static analysis of
programs involving return/break/continue, so as to pre-empt
UnmatchedNonlocalTransfer exceptions at runtime.

My colleague Phil Wadler at Edinburgh University recently drew my attention
to your work, suggesting that some of my own research ideas might find
applications there. Having studied your proposal (v0.5), it does indeed seem
plausible to me that this may be the case, so this message is a first attempt
at trying to convey my thoughts in Java terminology. Of course, it's highly
possible that you have thought through this kind of thing already - or that
I've misunderstood or overlooked something important! - but in any case,
I'd be very interested in discussing the issues further.

If I've got it right, my proposal has the following characteristics:

* It consists simply of a compile-time checkable restriction on programs of
   the Javac language exactly as proposed in the spec for v0.5. No additional
   annotations by the programmer are required.
* Programs obeying the restriction will never throw an
   UnmatchedNonlocalTransfer exception at runtime.
* The restriction doesn't rule out any standard Java programs (i.e. programs
   not involving closures).
* The restriction doesn't seem to exclude any of the motivating examples
   of Java programs with closures that I have seen so far.
* More tentatively, if the restrictions were adopted as part of the language
   spec, I believe this might also permit a simplification of the currently
   proposed system for tracking exceptions, yielding the same safety
   properties but requiring fewer exception-related annotations by the
   programmer. However, I won't go into details of this in the present note.

I'll describe the idea here using Java terminology. However, I'm also hoping
(soon!) to produce a draft paper presenting a toy language with formal typing
rules etc. which illustrates the essential issues, along with a proof of the
desired safety property for this language.

So here goes. I'll concentrate here on the restriction insofar as it affects
the use of "return", and then briefly comment on "break" and "continue".

The restriction consists of two aspects:

* A restriction on assignment expressions where the variable being
   assigned to is of function type. (Note however that *initializers* for
   variables of function type are not restricted, except as prescribed by
   the ordinary rules of Java, e.g. we can't have "return" if we're not
   within a method or constructor body.)
* A restriction on "return" statements themselves.

To describe the restrictions, the following notions are useful.
First, we suppose that in some preprocessing phase, the compiler has
annotated each occurrence of a method or constructor declaration with a
unique *target label*, and moreover has annotated each occurrence
of "return" with the appropriate target label as determined by lexical
scoping (i.e. with the target label of the closest enclosing method or
constructor declaration).

By a *dynamic element*, we shall mean either a variable of function type
or a target label. I don't think this is an especially good name (other
suggestions welcome!), but the intuition is that a dynamic element can be
"invoked" - by being applied in the case of a variable of function type,
or by being jumped to in the case of a target label - causing a computation
which might trigger (or itself be) an attempted transfer of control.
By contrast, a variable of primitive type is not considered "dynamic" since
there is nothing to "invoke". (It might seem odd that variables containing
references to objects are *not* deemed to be dynamic - roughly, this  
is because
a method invocation cannot *in itself* be responsible for a non-local  
transfer,
except indirectly via some other dynamic element such as a field in the
target object.)

Every dynamic element has a *scope*: the scope of a variable is
defined as usual, while the scope of a target label is deemed to be
the whole of the corresponding method or constructor declaration.
We define a partial order relation on dynamic elements as follows: y << x if
the scope of x strictly contains that of y. (The important intuition here is
"x may outlive y".) We also write y <<= x if the scope of x contains or is
equal to that of y.

We also have the usual notion of a variable appearing *free* in an
expression e. In addition, we say a target label t appears *free* in e if
e contains a return statement annotated with t and not "bound" by a
method/constructor declaration annotated by t within e itself.

We can now give an easy-to-understand but slightly crude first cut at our
restrictions as follows (I'll improve on this below):

* In an assignment expression x = e where x is of function type,
   no dynamic element y << x appears free in e.
* In a statement return e annotated with target label t,
   no dynamic element y <<= t appears free in e.

Thus, for example, the following four code fragments are forbidden:

1.   {=> int} x ;  // a field declaration, say
      void m () {
          x = {=> return 5 ;}
      }
      // scope of target label narrower than that of x

2.   {=> int} x ;
      void m () {
          {=> int} y = {=> return 5 ;} ;
          x = y ;
      }
      // scope of y narrower than that of x

3.   return {=> return 5 ;}
      // both returns will have the same target label here

4.   {=> int} x = {=> return 5 ;} ;
      return x ;
      // scope of x is narrower than that of target label

Whilst this version of our restriction suffices to guarantee safety of
returns, it imposes some possibly annoying restrictions on the way the
programmer has to write things. For example, under the above rules
one cannot write

      {int => {=> int}} K = {n:int => {=> n}} ;
      {=> int} f ;
      {  {=> int} g = {=> 5} ;
         f = K.invoke (g.invoke()) ;
      }

even though it is OK to write

      {int => {=> int}} K = {n:int => {=> n}} ;
      {=> int} f ;
      {  {=> int} g = {=> 5} ;
         int n = g.invoke() ;
         f = K.invoke(n) ;
      }

which is not essentially different. Thus, the programmer would sometimes
have to rewrite code in an unnatural style in order to conform to the
above rules.

The point about this example is that in the assignment

      f = K.invoke (g.invoke()) ;

the occurrence of g is not "dangerous", because the subexpression
g.invoke() is evaluated to a ground value (5) before the assignment is
performed, so there is no way for the dynamic behaviour of g to find
its way into that of f. One can take account of this via a slightly more
fine-grained version of our restrictions as follows.

Let's say a syntactic subexpression e' of e is *normal* if e' does not
appear inside a closure literal in e (i.e. underneath a lambda).
And say a dynamic element y *critically appears in* an expression e
if there is an occurrence y_0 of y which is not contained in any normal
subexpression e' of non-function type.

Thus, in the above example, g appears in K.invoke (g.invoke()),
though not critically. However, the appearance of g in {=> g.invoke()}
*is* critical, because the invocation of the dynamic behaviour of g
is delayed.

With these notions in place, we may now relax our restrictions to the
following (just by replacing "appears" by "critically appears"):

* In an assignment expression x = e where x is of function type,
   no dynamic element y << x critically appears free in e.
* In a statement return e annotated with target label t,
   no dynamic element y <<= t critically appears free in e.

One further simple relaxation is worth mentioning. For an assignment x = e,
the appearance of some y << x isn't dangerous if y is a variable and there
is no intervening target label between y and x. So we could in general define
y <<< x if for some target label t we have y <<= t << x, and then replace
"y << x" by "y <<< x" in the restriction on assignment expressions.

So that's the idea. It should now be reasonably clear how to extend this
scheme to deal with "break" and "continue", by annotating loop constructs
with target labels at appropriate points.

Of course, a few adjustments might be needed to take account of all aspects
of Java (e.g. I guess some care is also needed over where to put the target
labels in try-finally statements), but I'm hopeful that in essence the scheme
should be able to do everything one wants, without sacrificing much of
importance in terms of programming power or convenience.

Thanks for reading this far! I would be interested to know what you think,
so any reactions/queries/problematic examples gratefully received.

Cheers,
John Longley
(Lecturer, School of Informatics, University of Edinburgh, Scotland)

[PS: Please reply to my email address, as I guess I'm not a recipient of
this mailing list.]

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.