transparent lambda

Wed Jan 6 22:27:13 PST 2010

I'm glad this group is figuring out alternatives to overloading the "return" keyword, which (IMO) would be a mistake with a long tail of painful consequences.

A new syntax like "yield" (for some new keyword "yield", or maybe an operator like "^" in ST) avoids overloading return, which makes pre-existing code transparently insertable into the new constructs.   I think that's worth something; unlabeled "yield" is better than overloaded "return".

Adding some labeling mechanism is clearly better, though.

Note that "yield" is two things at once:  A branch and a binding of the return value.  So here's another angle to consider:  Is it worth separating the two actions?  This might make for a syntax more complex than a simple "yield x", but it might pay for itself in other ways, such as clarity or power from using the component actions separately.

The branchless bind of a return value is specified (in some current proposals) by putting the value in a favored position in the lambda syntax:
  lambda () (this is the value).

To add arbitrary code to such a form, the Java equivalent of a let-expression would work.  This is implicit in some designs of statement lambdas:
  lambda () {some let bindings etc.; this is the value}.

Here's a specific example:
  lambda () { int index = -1;  for (int i = ...) if (i is ok) { index = i; break; }; index }

Note that there are other options for specifying that "index" carries the return value.  The declaration could be put in a privileged place in the lambda header:
  lambda (return int index = -1) { for (int i = ...) if (i is ok) { index = i; break; } }

(That's inspired by ANTLR, where there's a nice symmetry between incoming and outgoing values in rules; both are named, and the naming also makes multiple value returns easy to specify.)

The special declaration could also be marked with a keyword on a declaration just inside the block which is to produce the value:
  lambda () { int return index = -1; for (int i = ...) if (i is ok) { index = i; break; } }

This does not violate TCP since the "int return" thingy is syntactically immobile:  It must stay with the immediately enclosing lambda body.

Or maybe (the extra colon avoids a grammar problem):
  lambda () int index = -1: { for (int i = ...) if (i is ok) { index = i; break; } }

Either of the last two variations (or whatever prettier one somebody comes up with) seems promising to me because the lambda block can be defined as a special kind of "let-expression", and then we can think about liberating let-expressions as a language feature in their own right.  (Of course, the dirty secret is that even if The Boss tells us we can't do that, we just resign ourselves to writing "lambda(){my let expression}()" when we need one.)

The nice thing about using named variables for return values is that Java's definite assignment rules help the programmer avoid basic errors:
  lambda (return int index) { for (int i = ...) if (i is ok) { index = i; break; }; /*error: index not DA on exit from block*/ }

If the variable is declared 'final', then the DU rules enforce a useful invariant that the variable is assigned exactly once!

What about the branching part?  Maybe it's possible to amend existing branch constructs to do the job.  After all, we already have labeled branches.
  lambda (return int index) L: { for (int i = ...) if (i is ok) { index = i; break L; }; index = -1; }

Compared with using an unlabeled "yield" or "return", this is verbose in three ways:
1. It forces the user to declare and name the return value.
2. It requires a labeled branch instead of an unlabeled one.
3. By separating the binding from the branch, it requires wrapping the two in {;} after the "if".

I think the verbosity from #1 pays for itself:  It makes the code more self-explanatory.

The verbosity in #2 could be reduced by making the return variable name serve double duty:
  lambda () int index = -1: { for (int i = ...) if (i is ok) { index = i; break index; } }

Maybe the verbosity from #3 could be reduced by hacking in a sort of labeled return shorthand, where "break id = expr" is short for "{id=expr;break id;}"
  lambda () int index = -1: { for (int i = ...) if (i is ok)  break index = i; } }

Maybe that last bit is over-clever.  Surely there's a prettier way to get the same effect.

But, supposing people could get used to such things, and to make this note complete, I'll make a straw-man proposal which splits out the bits I'm talking about into separate language features:

1. Statement lambda always produces a void value.
  lambda () { println("hello"); }  // returns void

Expression lambdas continue to be limited to expressions, but next we add a sort of let-expression.

2. An expression can consist of a single variable declaration with the trailing semicolon changed to a colon, followed by a brace-enclosed block.  The value of the expression is the declared variable, after the declaration is executed, and the block itself is executed. The declaration is in scope within the block (and nowhere else).  This allows something which looks like a statement lambda that returns value.  But it can actually be used anywhere.
  lambda () int x : { x=42; println("hello"); }

I'd like to call the new expression type a "let expression", because it functions that way, but it looks too backward from "normal" let expressions.  So I'll call it a "backwards let expression" for now.

3. Within a backwards let expression ("BLE"), the variable name also becomes an implicit label on the block.  This makes it easy to make an early exit from a BLE.  Unlabeled breaks do NOT match BLEs.  They continue to match exactly what Java 1.0 defined them to match:  the innermost for, while, or switch.
  lambda () int x : { x=42; if (true) break x; println("not reached"); }

(Other sorts of implicit labels might be worth adding, although there are minor compatibility problems.  If a "for" loop declares a variable, that name could also be an implicit label.  Same deal for try-object statements if they ever exist.)

4. In a bit of ad hoc syntax sugaring, an assignment and a labeled break can be fused, if the LHS of the assignment and break label are the same identifier.
  lambda () int x : { if (true) break x = 42; println("not reached"); }

Fused breaks can be used anywhere the required twin name bindings are visible, not just in lambdas or BLEs.

(Fused continues may also be reasonable, if implicit labels are added to for statements.)

HTH

-- John