Project Lambda: Java Language Specification draft

Fri Jan 22 17:36:06 PST 2010

A couple of thoughts on this draft, inline

On Fri, Jan 22, 2010 at 2:55 PM, Alex Buckley <Alex.Buckley at sun.com> wrote:
> This document does not consider implementation. The mapping of lambda
> expressions to objects, and of function types to class or interface
> types, is neither designed nor specified. Even if the mapping was
> designed here, it is unlikely ever to be specified in the JLS. Binary
> compatibility for lambda expressions will eventually be specified in
> terms of changes to function types only. It is a goal of this document
> to allow the implementer freedom as to how and when lambda expressions
> are evaluated.

I think these are unavoidable in the JLS.  The specification must be
precise enough that code generated by distinct Java compilers can
interoperate.  Within the specification, we need to know if a class
can "implement" a function type.  If so, it is probably an interface.
If not, it probably isn't.

> - Lambda expressions as closures: There are effectively-final
>  variables, but I am holding off shared variables for now. As
>  background reading to why loop variables should not be shared, see
>  http://blogs.msdn.com/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx.

Actually, that is largely an error in the specification of C#'s
foreach loop, which we're aiming to fix.

> * Expressions
>
> [15.8 Primary Expressions]
>
> Expression:
>  LambdaExpression
>
> LambdaExpression:
>  '#' '(' FormalParameterList_opt ')' '(' Expression_opt ')'
>  '#' '(' FormalParameterList_opt ')' Block
>
>  #()()
>  #()(5)
>  #()(x.m())
>  #()((foo++))
>  #()("a"+"b")
>  #()( {1,2,3} )   // Proposed collection literal expression from Coin
>
>  #(){}
>  #(){return 5;}
>  #(){x.m();}
>  #(){foo++;}

Did you mean to make these primary expressions?  I hope so.  Otherwise
many uses would require yet another set of parens.

> [15.8.3 this]
>
> The keyword this may be used only in the body of an instance method,
> instance initializer or constructor, or in the initializer of an
> instance variable of a class, *or in a lambda expression*.
>
> The type of this in a lambda expression is the function type of the
> lambda expression.
>
> /*
> Treatment of 'this' inside a lambda expression is essentially the same
> as 'this' inside the body of an anonymous inner class.
> */

That's the worst possible treatment of "this".  It has all the
disadvantages of the inner class approach (not being transparent) with
none of the advantages (you can't use any inherited members of the
type to which the lambda expression is converted).  This is the first
time I've heard anyone advocate such treatment for "this".

> [15.8.6 Lambda Expressions]
>
> A lambda expression is used to create a new object that is a lambda
> instance (15.8.7).

I don't think you mean to say "new" object.  I think you mean to say
that it results in an object that is a lambda instance.  Whether or
not it is new is something you explicitly said you didn't want to
specify.

> A lambda expression specifies a expression or block of code,

That's not quite right, because it could be neither (the expression is
optional).

> followed
> by a (possibly empty) list of formal parameters to the expression or
> block.

Actually, in your syntax the parameters *precede* the body, not follow it.

> The body of a lambda expression is an expression or a block of code.

Again, sometimes it is neither.

> If the body of a lambda expression is an expression, then the type of
> the body is the type of the expression.

And if it is neither?

> If the body of a lambda expression is a block, then either all or none
> of the return statements in the block must have an Expression. If no
> return statement has an Expression, then the body of the lambda
> expression is void, i.e. has no type.

That's inconsistent with the way methods work, and would prevent the
following useful code

Callable<String> lazyResult = #(){ throw new UnsupportedOperationException(); }

> If all return statements have an
> Expression, then the types of the Expressions must be
> assignment-compatible with each other, or a compile-time error
> occurs.

This disallows "if (e) return "foo" else return new Object();" because
Object is not assignment-compatible to String.

> The type of the body is lub(T1..Tn) where T1..Tn are the types
> of the Expressions after boxing conversion.

So the result type is always a reference type (or void) when the body
is a block?

> The type of a lambda expression is a function type #T(S1..Sm)(E1..En)

That syntax is not defined anywhere in your specification.  What is a
function type?

> where:
>
> - If the body of the lambda expression is void, then T is void,
>  indicating no return type; otherwise, T is the type of the body of
>  the lambda expression after capture conversion.
>
> - S1..Sm is the list, possibly empty, of types of the formal
>  parameters of the lambda expression.
>
>  #()() has type #void()

According to the specification above, this is missing some parens.

>  #() { if (..) return "1"; else return 2; } has type #Integer()

It looks like an error because of the constraint "If all return
statements have an
Expression, then the types of the Expressions must be
assignment-compatible with each other, or a compile-time error
occurs."

> Any local variable, formal method parameter, or exception handler
> parameter used but not declared in a lambda expression must be
> effectively-final.
>
> A local variable, formal method parameter, or exception handler
> parameter is effectively-final if it is never the target of an
> initialization or assignment expression except where definitely
> unassigned.

In other words, can't capture much more than an anonymous inner class
could.  Disappointing.

> It is a compile-time error to modify the value of an effectively-final
> variable in the body of a lambda expression.

Given the definition of effectively-final, above, this constraint is vacuous.

> FunctionType:
>  '#' ResultType '(' [Type] ')' FunctionThrows_opt
>
> FunctionThrows:
>  '(' 'throws' ExceptionTypeList ')'
>
> ExceptionTypeList:
>  Identifier
>  ExceptionTypeList '|' Identifier
>
> The notation #T(S1..Sm)(E1..En) indicates a function type with return
> type T, formal parameter types S1, S2, ..., Sm, and checked exception
> types E1, E2, ..., En.

I think you're missing "throws".  Either that, or you haven't told us
the relationship between this syntax and the syntax for function
types.

> 'void' may be used in a function type to indicate that the body of a
> lambda expression has no return value. This occurs if the body of the
> lambda expression is a block that can either a) complete normally or
> b) complete abruptly by reason other than being a return with value V.

What about the case when there's no expression or block?  e.g. #()()

> [4.10.4 Subtyping among Function Types]
>
> #T(S1..Sm)(E1..En) is a direct supertype of #V(U1..Um)(F1..Fo) iff all
>  of the following hold:
> - T is a supertype of V.
> - for i in 1..m: Ui is a supertype of Si.
> - for j in 1..o: There exists a k in 1..n such that Ek is a supertype of Fj.
>
>  #Object(String,Integer) is a supertype of #Package(Object,Number).
>  #Object(Object,Object) is also a supertype of #Package(Object,Number).
>  #Object(Object[]) is a supertype of #Object[](Object).

So "#float()" can be assigned from "#int()"?  It will be interesting
to see how one can generate verifiable code for this without resorting
to some reflection-like APIs.  Similarly, I'm surprised you allow
assigning "#void(float)" to "#void(int)".  How will the generated code
for the lambda know how to interpret the bits of the incoming value?

> Object is a direct supertype of any function type.
>
> A function type that is void (i.e. has no return type) and has formal
> parameter types P1..Pn is a supertype of a function type #T(S1..Sn)
> iff Si is a supertype of Pi (i in 1..n).

So "#void()" can be assigned from #int()"?  It will be interesting to
see how to generate verifiable JVM code for calling these (how much is
left on the stack by invoking a lambda?).

> The above [SAM] definition deliberately does not treat multiple non-Object
> abstract methods with compatible signatures as if they represented a
> single abstract method. This reflects existing practice whereby if an
> interface or abstract class has multiple such members, it is not
> possible for a non-abstract class to implement the interface/extend
> the abstract class simply by providing a single concrete method.

That's not existing practice:

interface A { void f(); }
interface B { void f(); }
interface C extends A, B {} // not SAM by definition
public class D implements C { public void f() {} }

> A lambda conversion exists from a function type #T(S1..Sm)(E1..En) to
> the descriptor of the target abstract method M of a SAM type, provided
> that all of the following hold:
>
> - If T is not void, then T can be converted to the return type of M by
>  assignment conversion.
> - If T is void, then M is void or has return type java.lang.Void.
> - M is not generic and has m formal parameters.
> - For i in 1..m, the i'th formal parameter of M has type Si.
> - For j in 1..n, the checked exception type Ej is a subtype of some
>  exception type in the throws clause of M.

+ The constructor is accessible?

Where in the caller are exceptions that were declared in the throws
clause of the SAM's constructor checked?

You need to specify the runtime behavior of this conversion (when the
constructor is invoked is observable).

> The type of this in a lambda expression is the function type of the
> lambda expression.

Oh, really?  So the result type of a lambda expression can depend on
the type of "this" (if "this" appears within a return statement).  And
the type of "this" depends on the result type of the lambda
expression.  Your specification gives no hint how this infinite
regress is to be resolved.

> Therefore, it is convenient for the body of a lambda expression to
> have access to members of the SAM type. To achieve this, I am thinking
> that 'this' in the body of the lambda expression may be cast to the
> SAM type:

It gets worse and worse!  I thought you were trying to avoid
specifying implementation details, but this would force the objects to
be the same.  I cannot imagine how you could make that happen without
introducing some significant inefficiencies.  Consider:

abstract class R { public abstract void run(); }
#void() lambda = #() { R self = (R)this; ... }
R r = lambda; // magic!

Note that the object referenced by the variable r is of type "R", but
it is also of the reference type "#void()" (it is visible with that
static type as "this" inside the body of the lambda).  Therefore, one
could convert lambdas from one "SAM" to a different compatible "SAM"
using function types as intermediaries:

abstract class R1 { public abstract void run(); }
abstract class R2 { public abstract void invoke(); }
#void() lambda = #() {}
R1 r1 = lambda;
R2 r2 = (#void()) r1; // magic!

Note that the above cast must succeed, because the object r1 must
dynamically be a subtype of "#void()" - otherwise, it would not be
capable of being viewed as that type when seen as "this" inside the
lambda.

Also, this will encourage people to write casts (that might be incorrect).

> [5.3] Method Invocation Conversion
>
> Method invocation contexts allow the use of one of the following:
> - a lambda conversion (5.1.14).

Interesting.  How does type inference and overload resolution work
when calling an overloaded method with an argument that is a lambda
expression?  Type inference currently does not have any rules to
handle these cases.

For example, if I have a generic method

<T> T doit(#T() lambda) { return lambda!(); }

And an invocation

String s = doit(#()("foo"));

There doesn't appear to be any way to infer that the type argument to
the invocation is String.  More subtly, with

<T extends Runnable> void doit2(T t) {}

How does one infer the type parameter T in the invocation

doit2(#(){});

> A lambda invocation expression on a lambda expression of type
> #T(S1..Sm)(X1..Xn) can throw an exception type E iff either:
>
> - some expression of the argument list can throw E, or
> - there exists an i in 1..n such that Xi is E.

+ or the receiver expression can throw E?