Are function types a requirement?

Wed Feb 17 20:06:38 PST 2010

I've been cheerleading this idea (no function types) at Devoxx '09, but so
far it's fallen on deaf ears. I've been playing with syntax and
implementations based on this for quite a while, so I'll share some of my
conclusions:

Syntax basics are almost the same as the current lambda-dev proposals, with
two addendums: You may optionally put a type name before the #, and a method
name after it. Thus, you can actually write lambda-style closures for
non-SAM types, which is good for working with existing java code:

WindowListener listener = WindowAdapter#windowClosing(WindowEvent e) { ...
};

It's not _that_ good, though, as only insane levels of inference could
possibly guess that just "#windowClosing", without also putting
"WindowAdapter" in the closure literal, is to be interpreted as being for
WindowAdapter when trying to assign it to a variable of type WindowListener.
Still, this part of it is certainly more convenient than other proposals.
(In this example, omitting either the class name or the method name would
result in an error; without a class name, inference would end up assuming
you meant to write WindowListener#windowClosing, which won't work for
obvious reasons, and without a method name the compiler would complain
because WindowAdapter isn't a SAM).

Inference is, of course, where it gets _really_ complicated, but on the plus
side, it allows us to pour all the limited resources we have left before
Java7 is out to work on inference. Once you've got inference perfected, the
rest is trivial; a closure literal is EITHER immediately interpreted as
simply an expression that produces a value of a type, or the compiler
immediately generates an error. There's no intermediate function type phase.

So, how would inference have to work? Well, if the Type name is mandatory,
inference is again trivial; if the Type is a SAM, then the method name is
inferred to be the one abstract method in that type. If not, then you must
have a method name. However, things are only this trivial if the type also
lists all generics information, and then we not only have the considerable
burden of writing probably reptitive type information anytime we want to
write a closure, it also won't solve the explosion of types in
ParallelArray. Thus, clearly, inference needs to be a lot more complicated
than this for it to be a passable closures implementation for java.

With some work on the JVM, it should be relatively simple to detect the
following situation:

1. Some method runs Integer.valueOf() whilst constructing a stack frame to
be passed off to another method via one of the INVOKE opcodes (if this is
difficult to detect, the compiler can perhaps emit a special call or opcode
to 'box' an integer this way). This is an immutable property for any given
parameter in any given INVOKEx opcode.

2. The invoked method's very first act when it first touches the object at
this stack position is to call intValue() on it. This is an immutable
property for any given parameter in any given method.

That's, in other words, the situation where one method calls another with a
primitive. Once this feature is in, which isn't useful just for closures but
a boon to the JVM in general, then it becomes possible to eliminate a lot of
types in ParallelArray's Ops; instead of having:

Ops.DoubleMaxReducer
Ops.IntMaxReducer
Ops.LongMaxReducer
Ops.MaxReducer<T>

You'd just have:

Ops.MaxReducer<T>.

All methods in the ParallelArray library that currently take an
Ops.DoubleMaxReducer will instead take an Ops.MaxReducer<Double>, and the
JVM will make sure that performance is not adversely affected as long as the
caller also works with a direct double. If the caller doesn't, this can only
be described as a win, as in such a situation the caller either has to do a
runtime if/elseif block or use MaxReducer<Double> without the benefit of JVM
optimized autobox/unbox elimination.

If that's done, then "all" that's left is inferring that something like
this:

Ops.MaxReducer<double> x = #(double a, double b) { ... }

or even harder, this:

ParallelDoubleArray a = ...;
a.cumulate(#(double a, double b) { ... }, 0);

is legal, and in fact means that the closure is in fact implementing
Ops.MaxReducer<double>#invoke(double a, double b), that (in the second
case), the intended target method signature is
ParallelDoubleArray.cumulate(Ops.MaxReducer<double>, double), and of course
that using primitive types in generics is legal in the first place.

This again has the property that its certainly not trivial to do so, but the
benefit of doing it extends well beyond merely closures. For example, even
with full blown BGGA closures, I don't see how you could ever write
something like:

new TreeSet<Integer>({int a, int b => ...});

whereas with a scheme to allow primitive types as type arguments, you could
not only write such a thing, it would be simpler:

new TreeSet<int>(#(int a, int b)( ... ));

How to allow primitives in generics declarations has been researched, and
abandoned, before, but if we can solve this problem today then function
types can be dropped. The cost/value analysis of trying to make it work has
perhaps changed now - simplying closures considerably is now added to the
value.

A very rough sketch on how to go about allowing primitive types in generics:

For Type _Parameters_ nothing needs to change. Writing class X<T extends
int> makes no sense.

For Type _Arguments_, 'int' is legal anywhere Integer would be. As far as
type bounds go, this means 'int' will take the place of either something
like <? extends Number> (including just ?/T, which is of course shorthand
for ? extends Object), or something like <? super Integer>. As an
implementation, all locations where a type "int" is used to fit any
parameterized type are by the compiler turned into Integer, and appropriate
box/unbox code is inserted at an appropriate place (Per field access for
fields, at the top of a method for parameters, and right before invoking /
setting when invoking methods / setting fields). The generated conversion
code for outgoing calls is simple (just wrap with Integer.valueOf), but for
incoming changes (from Integer to int), it needs to do both .intValue() and
a nullcheck. Sequential Integer.valueOf/.intValue calls, even across
invokes, would be eliminated by the JVM, so the performance hit should be
negligible in most cases.

Apply the same reasoning for all the other primitives.

If anyone remembers or has a link to the problems inherent in allowing
primitives in generics, that would be useful for this discussion.

Staggered release of these features (closures itself in JDK7, allowing
primitives in generics in JDK8) is possibly but a great solution; changing
an API to go from e.g. cumulate(DoubleMaxReducer r, double base) to
cumulate(MaxReducer<double> r, double base) is backwards compatible, but
only if both methods continue to exist simultaneously, and the
DoubleMaxReducer type can never be deleted. That's clearly not an optimal
situation.

Of course, if function types are an base requirement than forget I mentioned
it.

--Reinier Zwitserloot

On Thu, Feb 18, 2010 at 2:06 AM, Stephen Colebourne <scolebourne at joda.org>wrote:

> Recent threads have indicated the difficulties of adding function types
> to the language.
>
> - difficult/impossible to integrate with arrays (an omission that would
> certainly be a surprise to many I suspect)
>
> - difficult to integrate with varargs
>
> - difficult to implement (various strategies suggested)
>
> - difficult syntax choices (difficult to find something that is easy to
> read, given Java's painful checked exceptions)
>
> - debated invocation syntax - the func.() syntax
>
>
> One alternative is to omit function types from JDK 7, and only include
> conversion to SAM interface types. It would be a compile time error to
> declare a lambda expression/block that did not convert to a SAM.
>
> Function types could then be included in JDK 8 if a valid approach was
> available (ie. it gives the appearance of needing much more in depth
> study than the timescale allows)
>
> Pros:
> - reduces risk of making the wrong decision in the short JDK 7 time-frame
>
> - avoids all the issues above (AFAIK)
>
> - doesn't restrict adding function types later (we'd need some proof of
> this)
>
> - reduces the initial learning curve
>
> Cons:
> - doesn't tackle the problem of fork-joins excess of interfaces, which
> is declared as a major project goal
>
> - results in multiple, incompatible SAM interfaces for the same concept
>
> - no automatic handling of co/contra variance
>
>
> I therefore ask (in a positive way!) as to whether function types are a
> fixed requirement of project lambda? Or whether the SAM-conversion-only
> strategy would be considered for JDK 7? (My concern is to avoid the
> experience of generic wildcards where it was rather rushed in at the
> last minute by all accounts)
>
> I'm really looking for a simple answer from Alex rather than a debate on
> this...
>
> Stephen
>
>
>
>
>