Delurking comments on the 0.1.5 specification

Thu Apr 1 07:04:43 PDT 2010

I finished looking through the lambda-dev archives and the 0.1.5 draft
specification last night and thought I'd throw a few comments in while
everything is still fresh.

First, just to set context, I have a computer science background, have
no fear of closures, and, for the last five or six years, have been
working with with Java in a "rich" embedded systems context.  Our
systems are not resource constrained by the standards of most embedded
applications, but we don't have a HotSpot VM available either.  We
have many instances of code that looks like this:

    addSomething(various, parameters, ...,
        new Something<SomeType>() { public SomeType doSomething() { .... } }
        );

The "doSomething" methods are never very large: the tiny ones we
handle with anonymous inner classes; the ones that require significant
logic are generally implemented as named inner classes.  Our
experience has been that the anonymous inner class syntax is verbose
and hard to read, but that the advantages of combining the definition
and point of use outweigh the awkwardness for simple method
definitions.

We wouldn't replace all of our "Something" definitions by lambda
expressions, but the vast majority of the anonymous inner classes
could be replaced by a class that took a lambda expression as a
constructor parameter.

That said, here are my comments on the spec, starting with the "Issues" section:

ISSUE: Lambda expressions as closures

Effectively-final variables: I have reservations about
"effectively-final" variables, but suspect they will be more
convenient than the current state.  I find going back to add 'final'
declarations to variables and parameters to be a distraction.  Eclipse
offers a "Quick fix", but even that interrupts developer flow and
reduces productivity.  When reading the resulting code, I find the mix
of "final" and "non final" declarations to be unattractive, and
sometimes adopt the convention of using "final" everywhere I can, but
that convention is difficult to enforce (unless we let the IDE do it
as a source transform, but even that has problems).  I could imagine
making "final" the default and adding a new keyword (e.g., "var") to
denote things that can change, but that's not practical and I think
"effectively-final" is a good improvement on current state.  My main
concern with "effectively-final" is that the addition of a closure
referencing an "effectively-final" variable will effect the legality
of apparently disconnected code.

Shared variables: I would like to see them.  There has been a lot of
discussion about the use of a keyword or annotation to indicate that a
variable declared in a local context should be allocated in a more
persistent context and be made available for use and modification by
lambda expressions.  The (large) part of me that wishes I was
programming in Scheme says "just do it" (and forget the keywords).
Given this effort's explicit focus on parallelism, and my own
experience suggesting that subtle order-of-evaluation issues are both
common and hard to diagnose, I think such usage should generally be
made explicit.

I'm going to include some examples below using the "@Shared"
annotation syntax, but only because people have used it before.

In my opinion, having shared variables makes certain code very clean
and simple.  My own experience suggests that clean, simple, code
translates to increased productivity.  Sadly, a (very) quick search on
the internet turned up a lot more opinion than actual data, so we're
somewhat stuck in the aesthetics wasteland.

For example:

    @Shared int count = 0;
    @Shared int sum = 0;
    @Shared long sumOfSquares = 0;
    complexDataStructure.visitMembers(#(int x){count += 1; sum += x;
sumOfSquares += x * x; });

One could, of course, do this with an iterator, but iterators have to
maintain explicit traversal state and are sufficiently painful to
write that I'm willing to bet that most Java programmers have never
written one.  One can use java.util.concurrent.atomic (which works
well for this example, though is arguably less efficient since each
"AtomixXXX' instance needs to be separately instantiated).  One can
also write a separate "Accumulator" class with 'count', 'sum', and
'sumofSquares' fields, but that code is much longer and visually
separated on the page.

Concerns about interaction with parallelism are, I think, pretty much
addressed by making the use of shared variables explicit.  If you
don't want the feature, the don't use it.  There's no way to prevent
people from doing this sort of thing in Java  (think the array hack),
so one might as well make it explicit.

Similarly, concerns about the overhead of "boxing" values are both
valid (IMO) and well addressed by making shared variable usage
explicit.

As for the syntax, I find myself not caring very much.  Aesthetically
I would prefer a keyword to an annotation, but I understand the
backward compatibility problems of introducing new keywords into a
widely used language.  I hate "re-purposing" keywords and, thankfully,
this is one place where nobody has proposed using "#" yet (a joke).
All in all, given the semantics of annotations in Java, I think that
defining an annotation and making the modification of an unannotated
"shared" variable generate a warning messages is probably the most
practical solution as it addresses the issues as well as any of the
others, isn't verbose, only makes my eyes hurt a little, and won't
break backward compatibility.

By the way, I hope and expect that anything we do for lambda
expressions in this area will also apply to anonymous inner functions.

ISSUE: Lambda instance invocation

I dislike Java's separate method and variable scopes.  In fact, I
programmed in Java for years before I even knew it HAD separate
scopes... it never occurred to me.  I never missed them.

Given that they're there, I guess we need more syntax (aesthetic
arguments again).  With that caveat, I think the draft standard's
"fn.(arg...)" syntax does a good job at being invisible, but I'm not
sure that's a good thing.  Part of me thinks that if we can't make the
syntax BE a normal function call syntax, then we should make it
something different.  The "fn.invoke(arg...)" syntax would do that, or
we could define a "Function" class with a static "apply" method and
use "Function.apply(fn, arg...)".  With a static import, this would
shorten to "apply(fn, arg...)" which to my eye isn't half bad.  The
problem with using "invoke" or "apply" is that it puts the emphasis on
the application rather than on what is being applied.

I suspect, in the end, that I won't like whatever is adopted very
much, as what I really want isn't achievable without abusing the
language spec.

SECTION 15.8.3 this

I would prefer that lambda expressions NOT introduce binding for
"this" and that access to "this" be treated in exactly the same was as
access to a final variable in the enclosing scope.

* I agree that there should be a mechanism for making recursive
function definitions.
* I'm not a fan of special casing the definite assignment and field
initialization rules.
* I refuse to do the "initialize-to-null-then-assign-the-lambda" dance.

As at least one person pointed out, it would be easy to introduce a
new way to reference the current function definition other than by
using the keyword "this" as an object reference.  Possibilities that
come to mind are:

    #this -- the object itself, by analogy to the lambda expression syntax.
    #this(arg...) -- A recursive call to the current definition, by
analogy to the use of "this" in constructors.
    .(arg...) -- horrible, but doesn't use the "this" keyword.

The possibilities are, sadly, endless.  What I like about the last two
is that they don't expose the resulting object inside the lambda
expression, while still allowing recursive function evaluation.  As
Neal Gafter pointed out, exposing the function object could force the
compiler to create two object instances in some cases where only one
should be necessary.  While it's possible that an advanced compiler
could optimize some of that away, or that HotSpot could transport the
ugliness to another dimension somewhere, the default javac isn't an
advanced compiler and my poor little embedded systems don't have
HotSpot.

Philosophically, it seems to me that all of the uses of "this" that I
know of in the Java spec so far refer to class instances.  I just
don't see the advantage of overloading that meaning in order to
support recursive function evaluation when other mechanisms are
clearly available.  If lambda expressions were a shorthand for
anonymous inner classes, which the charter for this group explicitly
states is not the case, then having lambda expressions define 'this'
would clearly be the right thing to do.  Lambda expressions are not
class definitions, however, so I would prefer to just leave "this"
alone.

A side benefit of removing the special treatment of "this" is that it
cleans up an ugliness in the draft specification.  In Section [15.8.6
Lambda Expressions] the draft specification states "It is a
compile-time error if any Expression [in a return statement] mentions
'this'".  The spec then goes on to explain that banning 'this' in a
return is necessary for type inference to work.  If lambda expressions
don't define 'this' then this restriction can be removed.

The restriction is not trivial, by the way... consider our "favorite"
factorial example:

    #(int n) { return n == 0? 1 : n * this.(n-1); }

Wouldn't this be illegal under the draft specification?  Assuming that
the same reasoning that led to the banning of 'this' from 'return'
statements also applies to the expression form of lambda expressions,
wouldn't ALL recursive definitions of lambda expressions be forbidden
when using the expression form?

Finally, in my own experience, I've found that "this" causes a number
of annoyances with anonymous inner classes.  I use
EventQueue.invokeLater a fair amount.  Consider the following method
fragment:

    class Foo {
        ...
       doSomething(this)
       ...
    ...

The transformation to an InvokeLater looks like:

    class Foo {
        ...
        EventQueue.invokeLater(new Runnable() { public void run() {
doSomething(/*wrong*/ this); }});
       ...
    ...

The correct transformation is, of course:

    class Foo {
        ...
        EventQueue.invokeLater(new Runnable() { public void run() {
doSomething(Foo.this); }});
       ...
    ...

For inner classes the transformation is worse.  It's not the end of
the world, but it is a real world pain in the neck, and I'd like to
avoid it for lambda expressions if at all possible.

SYNTAX IN GENERAL

There's so much discussion of syntax on this list that I found myself
wishing for a moratorium.  Define a "working" syntax that nobody
considers it to be a real proposal, iron out the semantics, and handle
the real syntax in a second phase.  Syntax matters, but it's
beautifully separable from the semantics and, by its very nature,
breeds arguments.

If anyone likes that idea, I'd suggest borrowing heavily from another
language, such as a prefix version of Scheme or JavaScript.  Something
like:

    lambda(int x, int y) throws E1, E2 { body }

No need for expression lambdas as they're pure syntax.

In general, though, I found most of the proposals difficult to read.
The proposal in the draft specification is nicely compact for
expressions but doesn't compose well.  Neal Gafter's proposal composes
better.  The version of "curry" I like the best is actually:

    (lambda (function) (lambda (t) (lambda (u) (function t u))))

I like it because the the Scheme syntax composes really, really, well
and my brain doesn't have to deal with operator precedence.  Of
course, I like RPN calculators for the same reason :-)

I'm not suggesting it, because it just isn't Java.  As most folks on
this list clearly know, lambda expressions as a tool in programming
languages have been around for a long, long time.  They are only new
to Java.  There's a lot of good stuff out there.

WISH LIST

* I'd like to see us adopt a goal of having everything that can be
done with a lambda expression also be doable with an anonymous inner
class.  I am not suggesting that lambda expressions be implemented as
classes.  I am suggesting that developers will benefit if there is a
clear migration path from lambda expressions to anonymous inner
classes.  For example, if the language supports SAM conversion for
lambda expressions, then I'd like to see a similar capability for
method references (e.g., new "Foo()#method").  It's pure syntactic
sugar but I think it will make it easier to transition from one form
to the other and will help avoid situations where developers use
awkward workarounds in order to avoid the overhead of switching to a
more appropriate mechanism.

* I'd love to see a shorthand for anonymous inner classes that follows
whatever is done for lambda expressions.  Something like:

    new Fubar(param...) #methodName(int x)(x + 1)

where "methodName" would be overridden.  A single, overridden, method
would be fine, as the standard anonymous inner class syntax would, of
course, be available if more was needed.  This has been a common
special case for us.

Thanks!

-- Jim Mayer