Overload resolution simplification

Mon Aug 26 14:54:26 PDT 2013

Recall that I outlined a simplification to overload resolution a few weeks ago.  There has been some useful discussion about the implications; I'd like now to finalize the decision.  I do not sense a general pushback from EG members, but please speak up if at this point you're uncomfortable with following the approach I presented.

There's been a long discussion on the lambda-spec-observers list related to this, which Maurizio has been kind enough to provide an expert voice in.  Those who dislike the proposed overloading behavior generally express discomfort with some of the compiler's limitations:
1) can't disambiguate based on parameter usage in the lambda body
2) can't see "obvious" information from return context and check lambdas early
3) can't see "obvious" matching parameter types and check lambdas early

Of course, there are trade-offs between simplicity and expressiveness, and the proposal sides with simplicity in each case.  We can examine them more closely:

1) I've gotten pretty definitive feedback from the EG that we do not want overload resolution to depend on how a lambda parameter is (or is not) used in a lambda body.  Too many Bad Things happen: simple refactorings can break things, users struggle to understand multiple typings of the same expression, lots of technical problems.

2) The canonical example here is Comparators.comparing.  Java has always made overload resolution choices independent of context, and we do not propose to change that basic design principle now.  Thus, there is simply no way to choose between overloads when the only clue for disambiguating is buried in an implicit lambda body that can't yet be type-checked.  (To be clear: _this is not new_.  The overloading limitation has always been there, and no concrete proposal explored by this JSR has done away with it.)

3) This one is more subtle.  The idea is that, in principle, we could carve out exceptions for (1) and (2), but still rely on implicit lambdas for overload disambiguation if neither one applies.  I proposed an analysis at the August 1 meeting that would do this: carving out (1) by raising an error if an attempt was made to type the same lambda with multiple sets of parameter types, and carving out (2) by distinguishing between invocation-context-dependent lambdas and invocation-context-independent lambdas.  The feedback I got is that we needed something simpler.  Hence, the simplified proposal that defers all implicit lambda typing until after overload resolution.

Is the sacrifice of expressiveness important?  Zhong Yu described three reasonable overloading patterns, to give a sense of what is being lost [1]:

> 1. primitive vs boxed
> 
>    map( T -> Integer )
>    map( T -> int )
> 
> 2. flat map
> 
>    Future<T>
>        Future<S> then( T -> S );
>        Future<S> then( T -> Future<S> );
> 
> 3. void vs Void
> 
>        Future<S> then( T -> S ); // where S=Void
>        Future<Void> then( T->void );
>    // in the 2nd version, the lambda body does not need to return (Void)null

I argued in my initial mail that our experience with the Stream APIs suggests to us that, even without the language restriction, it's often helpful to provide disambiguating names anyway.  Otherwise, the distinctions might be too subtle.  (For example, we renamed 'map' to 'mapToInt', etc., _without_ being required to do so by the compiler, in part because it provided useful information about otherwise-subtle behavior.)

We also found in the Stream API that deceptively-similar patterns can arise in cases in which (1) or (2) apply, and so i) it's not obvious when overloaded declarations like this are "okay", and ii) it's hard to avoid sometimes having to use a set of distinct names for a (conceptually) single overloaded operation, leading to naming inconsistencies.  (For example, 'flatMap' and 'comparing' had to be renamed.  And then it seemed silly to have a 'map' that didn't match 'flatMap'.)

One suggestion in lambda-spec-observers was that we could support examples like these three, while avoiding some of the complexity of a more general approach to (3), by checking implicit lambdas "early" if all remaining overloads agree on the parameter types for the lambdas (and never when inference variables appear in parameter types).  While reducing some complexity, this still has the two problems I described in the previous paragraph.

Maurizio (Aug 19) had some additional critiques [2]:

> the restriction that I've seen popping up more frequently is: only do 
> type-checking if _all_ overloads agree on the implicit lambda parameter 
> types, correct? There would be no combinatorial explosion then.
> 
> This is generally a good strategy (in fact the one I've been proposing 
> since the beginning as a possible incremental update). But it's not 
> issue free. It requires global reasoning for the user. To answer the 
> question: will this lambda be checked during overload (so that most 
> specific will get better treatment)? You really have to work hard, look 
> at all overloads, and see if they agree on the lambda parameter type. As 
> I was trying to explain last week, this is way harder than it looks on 
> the surface, as those target types can be nested within other generic 
> types, type-variable declarations can be subtly swapped, you can have 
> wildcards which require their own handling. All this _will_ make things 
> more complex for the user (in the current model a lambda is type-checked 
> during overload only if it has explicit parameters - rather easy to see, 
> no?).
> 
> Another problem is that the approach will create an asymmetry between 
> generic and non-generic method overloads

---

Last, there's been some confusion about how method references fit into this story, so I thought I'd try to clarify.

We put method references in two categories: "exact" and "inexact" (trying out this terminology; feel free to suggest alternatives).

Exact method references always refer to the same method, with the same arity and parameter/return types, independent of context.  To meet this standard, they must not be:
- Overloaded (some other method exists in the same type with the same name)
- Generic (declares type parameters)
- Varargs

(This design is informed by the fact that a high percentage of method declarations meet this standard.  The remaining cases might eventually be handled with some combination of i) stronger inference, and/or ii) explicit parameter type syntax.)

(A method reference like Foo::m is a special case -- it could be an unbound instance method reference or a static method reference; but since there's only one method, we can determine the arity of this reference in a context-independent way by seeing whether the method is declared static or not.)

Exact method references are analogous to explicit lambdas: the "meaning" of the expression is context-independent, even though its type (is it a Function? a Predicate?) is not.

During overload resolution, exact method references can be used to disambiguate, like explicit lambdas.  Inference constraints can be produced from the referenced method's parameter and return types.  Most-specific logic that prefers ToIntFunction<Foo> over Function<Foo,Object> can be employed.

Inexact method references behave like implicit lambdas.  The only thing checked during overload resolution is that they have the right "shape" (an arity check -- but note that, unlike implicit lambdas, an inexact method reference can support multiple arities, so we test whether any possible referenced declaration has an appropriate arity).  If an inexact method reference is passed as an argument to an overloaded method where multiple targeted functional interfaces have a compatible arity, an ambiguity generally occurs, unless some other invocation argument disambiguates.  The "meaning" of the invocation, like the meaning of a lambda body, remains unknown until a set of concrete parameter types can be provided by a target type.

While we don't have the benefit of two different syntaxes to visually distinguish the two forms, the goal is for the same rules that apply to implicit/explicit lambdas to also apply to method references.  Hopefully this makes reasoning about overloading behavior in the presence of method references tractable for programmers.

---

Conclusion: the discussion (and time to mull over and experiment with things) has not dissuaded us Oracle folks from thinking the proposed path is a good idea.  Nor do I sense substantial pushback from the EG, while at the same time this solves some of the hard complexity problems that the EG was uncomfortable with.  The prototype (in the OpenJDK Lambda repository) seems to be working.  We have a much cleaner story to tell users.  And, finally, this conservative path leaves us room to change our mind and add extra power in a later version, if needed.  So it looks like all signs point to adopting this plan.

Please chime in if you feel like there's anything we've overlooked...

—Dan

[1] http://mail.openjdk.java.net/pipermail/lambda-spec-observers/2013-August/000422.html
[2] http://mail.openjdk.java.net/pipermail/lambda-spec-observers/2013-August/000474.html

On Aug 8, 2013, at 6:19 PM, Dan Smith <daniel.smith at oracle.com> wrote:

> We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference).  There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity.
> 
> The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity.  In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction.
> 
> A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery.  And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns.
> 
> So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach:
> 
> Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references.  If the body of a lambda is important for disambiguation, it must have explicit parameter types.
> 
> Benefits of this approach:
> - Very easy to understand -- it's mostly a syntactic distinction
> - Consistent between all different patterns of overloading that were previously treated differently
> - Facilitates a simple declaration-site warning check when method signatures conflict
> - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas
> - Avoids re-checking lambdas with different parameter types which means:
> -- Typing of lambda bodies is easier for users to process
> -- Implementations don't have to do speculative checking of arbitrary blocks of code
> -- Bad theoretical complexity goes away
> 
> We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last).
> 
> Any questions/concerns?
> 
> ---
> 
> Here's an example of something we would stop disambiguating:
> 
> interface I<T> {
> <R> R map(Function<T,R> f);
> int map(ToIntFunction<T> f);
> long map(ToLongFunction<T> f);
> double map(ToDoubleFunction<T> f);
> }
> 
> someIofString.map(s -> s.length());
> 
> Declaration-site workaround: rename the methods.
> 
> Use-site workaround: explicit parameter type:
> someIofString.map((String s) -> s.length());
> 
> ---
> 
> Here's an example of something else we would stop disambiguating:
> 
> static void m(Function<String,Integer> f);
> static void m(ToIntFunction<Object> f);
> 
> m(x -> x.length() > 10 ? 5 : 10);
> 
> ---
> 
> And here's something that we never could disambiguate in the first place (due to fundamental design constraints):
> 
> interface Comparators {
> <T, R extends Comparable<? super R>> Comparator<T> comparing(Function<T, R> f);
> <T> Comparator<T> comparing(ToIntFunction<T> f);
> }
> 
> Comparator<String> cs = Comparators.comparing(s -> -s.length());
> 
> ---
> 
> —Dan