return-from-lambda viewed as dangerous, good alternatives

Thu Jan 7 01:16:40 PST 2010

On Dec 28, 2009, at 8:25 AM, Neal Gafter wrote:

> To: Peter Levart <peter.levart at gmail.com>
> Cc: lambda-dev at openjdk.java.net
> Subject: Re: transparent lambda
> 
> I'm moving this conversation to the closures-dev mailing list.

I note that the closures-dev group is dealing with the details of what fully transparent lambdas would look like, and in particular how to avoid overloading "return".  I've followed up there with some detailed observations:
  http://mail.openjdk.java.net/pipermail/closures-dev/2010-January/000534.html (etc.)

For the purposes of this list (lambda-dev), I want to make a clear argument against overloading the "return" keyword to denote a lambda result value, in addition to its denotations in current Java. I also want to suggest a range of alternatives.

First, if we were going to do "transparent lambdas" (which I understand we're not, but stay with me a second), adding a new meaning to the "return" keyword obviously interferes with the old meaning, which is "find the enclosing method and return a value from it".  You can test this interference by imagining a hapless programmer refactoring code and pasting some stuff containing a "return" statement into a new lambda expression.

It's no good saying "Aha, but every lambda body is really a method in the VM, so the meaning of 'return' is unchanged."  That's a category error.  We're designing language here, and it's a mistake to mix in implementation considerations and/or assume the writer of the code is thinking about such implementation considerations when the lambda is being written.  If you still think lambdas are "just methods in a different dress", you probably need to go read about lambdas and functions somewhere else.  Not every datum is a Java object; otherwise we wouldn't be busy designing functions for Java.

OK, so we're not doing fully transparent lambdas.  We're going to be pragmatic and aim at a useful subset of transparency.  Nobody is aiming at some sort of non-transparency or anti-transparency.  Putting a different sub-language with different rules in lambda bodies would make lambdas too dangerous to use.  Imagine again the hapless programmer editing code in and out of lambda bodies.  So we give ourselves permission to bend but not break the principle of transparency.

At that point we have to be careful.  In some cases, if you start to break a rule, you might as well take both the full consequences and full benefits of breaking it all the way.  But in this case, there are both good ways and bad ways to bend transparency.  The best way is to make the "broken" constructs (like break and continue) illegal.  The worst way is to make the "broken" constructs have a different meaning inside the lambda than outside.  A purist will say (and one did say somewhere recently) both are violations of transparency, that a syntax error is just as non-transparent as a different answer.

But I want to say this very clearly:  If you can't give the same answer both inside and outside the lambda body, it is FAR BETTER to refuse to give any answer, than to quietly substitute a different answer.  Isn't it obvious why?  If I start mixing lambdas into my code, if the pre-existing expressions don't "fit" into a lambda, I want the computer to tell me right away with a compile-time error.  For the computer to do do something "approximately similar" or "analogous in the mind of the language designer" or "really cute in the published use cases" will deeply disappoint me if I don't find out the mistake until the code has been delivered to a customer.

And, in these terms, every single "return" in the whole world of Java is a bug waiting to happen, as soon as the code it's in starts to be rewritten to include lambdas.

There's a second reason for not overloading returns, too.  An overloaded return not only fails to implement transparency, but it also PREVENTS any future version of Java (or any sister language) from increasing the transparency to include control transfers.  To overload "return" is effectively to say "we'll never need uplevel branches in Java".  And that is, to put it mildly, short-sighted.  If you've ever designed a multiple-purpose language before, you know that any statement beginning "our users will never need..." is almost always false.

At this point it's easy to recall a whole spectrum of snorts of disgust:  "Too complex", "Not the language I know and love", "Never efficient", "Can't learn it", "Would never be useful".  Some may think it is Just Wrong to want to "break" or "return" through a lambda in Java, ever.  (BTW, these are also the objections that garbage collection met with routinely 15-20 years ago, and inner classes met with early in Java's life.  I've been there, done that.)  In the case of uplevel branches, many people, including me, disagree with the present forms of those objections, on the basis of decades of language design history.  Control abstractions alone amply motivate uplevel branches.  To forbid them from lambdas now simply means that another kind of closure will have to be invented, uselessly, when it's time to add in control abstractions.

Mark started this project by saying he intended to take the best from previous efforts at closures, etc.  I applaud this strategy.  As we simplify and adapt, let's not do it in such a way that closes down the language's future, especially in the directions mapped out by those previous efforts.  Especially by redefining existing syntaxes or polluting transparency with approximations.

Here are some examples that will help us wean ourselves off of the assumption that "return" is the must-use keyword for lambdas.  They have all been discussed elsewhere, so I won't go into detail.  But I do want to close on a positive note by pointing out a few of the many alternatives to overloaded "return":

1. Add a new contextual keyword like "yield".  (Still not strictly transparent, but doesn't add a new meaning to "return".)
  #() {int x = 1; int y = x+1; yield y;}

2. Add a "block expression" or "let expression" syntax whose final value is an expression in some favored position.  It is a syntax in its own right, but used with lambda would look like:
  #() (int x = 1; int y = x+1; y)  // http://www.javac.info/closures-v06b.html
or
  #() {int x = 1; int y = x+1; y}  // from a discussion in 2006
or
  #() int y: {int x = 1; y = x+1;} // favored position occupied by a named declaration

3. Add a "breakable block expression" syntax which uses labels to make the linkages more explicit.  Again, a syntax in its own right, but with lambda would look like:
  #() L:{int x = 1; int y = x+1; break L: y;}
or
  #L() {int x = 1; int y = x+1; break L: y;}  // warning: block expression mixed up with lambda
or
  #() L:{int x = 1; int y = x+1; return L: y;}

My personal favorite at the moment is some form of 3.  The "return" flavored version is a syntactically distinct use of the return keyword, rather than an overloading of a pre-existing syntax.  

Neither 2 nor 3 have transparency problems, because the value return is more strongly coupled to the block it is returning from.  Option 1 pushes transparency bugs further into the future, since they only arise when lambda code gets doubly nested.

Both 2 and 3 would add value even apart from lambda expressions, since they amount to the "let expressions" present in all but the most primitive lambda-capable languages.  (Yes, it would be a pity to have lambda without let.  We'd cope the way they do in JavaScript.  The main point here is that it's a safe move, with lots of precedent, to design lambda and let together.)  Example of a let expression without a lambda:
	int x = x:{ for (int i : indexes()) if (indexIsGood(i)) return x:i; return x:-1;};
Corresponding JavaScript-flavored workaround:
	int x = (#() x:{ for (int i : indexes()) if (indexIsGood(i)) return x:i; return x:-1;})();

-- John

P.S.  I'm not just spinning ad hoc theories here to kick at a syntax I don't like.  This is the mental framework I used with inner classes.  When faced with the difficulties of handling uplevel variable references in inner classes, I chose to require the "final" keyword on uplevel references, and made the language refuse to evaluate uplevel references to non-finals.  Transparency is limited, but where it is allowed at all it is EXACT, not approximate.

We could have chosen an "approximation" of uplevel references, say taking a snapshot of the current value of the variable.  This would have made certain use cases (with loops) look easier.  But I am certain it would have caused many thousands of subtle refactoring bugs downstream.  I also implemented shared uplevel variables using 1-element arrays to get correct semantics for mutable values, but threw it away.  (That design for uplevel references had other dangerous interactions with loops, well known in the Common Lisp community.  It also had too many hidden costs for the JVMs of that time.)  The point then and now is that, if for some reason full transparency is impossible, make the impossible constructs fail; don't assign them some other cute useful meaning.  This applies to "return", and may well apply to other questions about lambdas before we are done.