Overload resolution simplification

Mon Aug 26 19:25:35 PDT 2013

On Mon, Aug 26, 2013 at 4:54 PM, Dan Smith <daniel.smith at oracle.com> wrote:
> Recall that I outlined a simplification to overload resolution a few weeks ago.  There has been some useful discussion about the implications; I'd like now to finalize the decision.  I do not sense a general pushback from EG members, but please speak up if at this point you're uncomfortable with following the approach I presented.
>
> There's been a long discussion on the lambda-spec-observers list related to this, which Maurizio has been kind enough to provide an expert voice in.  Those who dislike the proposed overloading behavior generally express discomfort with some of the compiler's limitations:
> 1) can't disambiguate based on parameter usage in the lambda body
> 2) can't see "obvious" information from return context and check lambdas early
> 3) can't see "obvious" matching parameter types and check lambdas early
>
> Of course, there are trade-offs between simplicity and expressiveness, and the proposal sides with simplicity in each case.  We can examine them more closely:
>
> 1) I've gotten pretty definitive feedback from the EG that we do not want overload resolution to depend on how a lambda parameter is (or is not) used in a lambda body.  Too many Bad Things happen: simple refactorings can break things, users struggle to understand multiple typings of the same expression, lots of technical problems.
>
> 2) The canonical example here is Comparators.comparing.  Java has always made overload resolution choices independent of context, and we do not propose to change that basic design principle now.  Thus, there is simply no way to choose between overloads when the only clue for disambiguating is buried in an implicit lambda body that can't yet be type-checked.  (To be clear: _this is not new_.  The overloading limitation has always been there, and no concrete proposal explored by this JSR has done away with it.)
>
> 3) This one is more subtle.  The idea is that, in principle, we could carve out exceptions for (1) and (2), but still rely on implicit lambdas for overload disambiguation if neither one applies.  I proposed an analysis at the August 1 meeting that would do this: carving out (1) by raising an error if an attempt was made to type the same lambda with multiple sets of parameter types, and carving out (2) by distinguishing between invocation-context-dependent lambdas and invocation-context-independent lambdas.  The feedback I got is that we needed something simpler.  Hence, the simplified proposal that defers all implicit lambda typing until after overload resolution.
>
> Is the sacrifice of expressiveness important?  Zhong Yu described three reasonable overloading patterns, to give a sense of what is being lost [1]:
>
>> 1. primitive vs boxed
>>
>>    map( T -> Integer )
>>    map( T -> int )
>>
>> 2. flat map
>>
>>    Future<T>
>>        Future<S> then( T -> S );
>>        Future<S> then( T -> Future<S> );
>>
>> 3. void vs Void
>>
>>        Future<S> then( T -> S ); // where S=Void
>>        Future<Void> then( T->void );
>>    // in the 2nd version, the lambda body does not need to return (Void)null
>
> I argued in my initial mail that our experience with the Stream APIs suggests to us that, even without the language restriction, it's often helpful to provide disambiguating names anyway.  Otherwise, the distinctions might be too subtle.  (For example, we renamed 'map' to 'mapToInt', etc., _without_ being required to do so by the compiler, in part because it provided useful information about otherwise-subtle behavior.)

This is a singular example with its own circumstances. It is not
enough to make the generalization that "most overload cases are bad
design anyway, they are better off with distinct method names".

>
> We also found in the Stream API that deceptively-similar patterns can arise in cases in which (1) or (2) apply, and so i) it's not obvious when overloaded declarations like this are "okay",

Programmers do not need to understand the intricate language details.
Most do not read the JLS sections about method overload or type
inference, yet they get by. Their intuitions lead them to the sensible
design, most of the time.

The overload use cases we've seen so far all have this simple property
- all methods agree on the lambda parameter types. That cannot be mere
coincidence. There are some subconscious models in our heads that
measure what is or is not suitable for overload.

If an API author creates some overloaded methods that don't work well
with implicit lambda, he will discover the mistake pretty soon during
testing. He does not need to have a formal understanding of the rules
to reach a working solution; he'll reach there by trial and error.

> and ii) it's hard to avoid sometimes having to use a set of distinct names for a (conceptually) single overloaded operation, leading to naming inconsistencies.  (For example, 'flatMap' and 'comparing' had to be renamed.  And then it seemed silly to have a 'map' that didn't match 'flatMap'.)
>
> One suggestion in lambda-spec-observers was that we could support examples like these three, while avoiding some of the complexity of a more general approach to (3), by checking implicit lambdas "early" if all remaining overloads agree on the parameter types for the lambdas (and never when inference variables appear in parameter types).  While reducing some complexity, this still has the two problems I described in the previous paragraph.
>
> Maurizio (Aug 19) had some additional critiques [2]:
>
>> the restriction that I've seen popping up more frequently is: only do
>> type-checking if _all_ overloads agree on the implicit lambda parameter
>> types, correct? There would be no combinatorial explosion then.
>>
>> This is generally a good strategy (in fact the one I've been proposing
>> since the beginning as a possible incremental update). But it's not
>> issue free. It requires global reasoning for the user. To answer the
>> question: will this lambda be checked during overload (so that most
>> specific will get better treatment)? You really have to work hard, look
>> at all overloads, and see if they agree on the lambda parameter type. As

Javac has to work hard to prove that; but the programmer does not have
to. When he reads a piece of code, given that it compiles, he knows
that the implicit lambda parameter type has a unique solution; to
solve the type himself, he reads *the* method declaration if it's not
overloaded, or *any* method declaration if it is overloaded.
Therefore, overload or not does not affect how he figures out the
type.

Suppose this compiles

    void m(X->Integer)  // not overloaded
    m(x->x.value);

or this compiles

    void m(X->int)  // not overloaded
    m(x->x.value);

In either case, a code reader needs to figure out the type of `x` by
checking the method signature.

Suppose we support the restricted overload we are talking about,

    void m(X->Integer)
    void m(X->int)

    m(x->x.value);

does it increase the cognitive workload for the code reader to figure
out the type of `x`? Absolutely not. Since the solution is unique, he
picks an arbitrary method, say `m(X->Integer)`, to figure out the type
of `x`. It is exactly like what he did in the non-overloaded case.
Overload does not make it any more difficult to understand the meaning
of the implicit lambda.

>> I was trying to explain last week, this is way harder than it looks on
>> the surface, as those target types can be nested within other generic
>> types, type-variable declarations can be subtly swapped, you can have
>> wildcards which require their own handling. All this _will_ make things
>> more complex for the user (in the current model a lambda is type-checked
>> during overload only if it has explicit parameters - rather easy to see,
>> no?).
>>
>> Another problem is that the approach will create an asymmetry between
>> generic and non-generic method overloads
>
> ---
>
> Last, there's been some confusion about how method references fit into this story, so I thought I'd try to clarify.
>
> We put method references in two categories: "exact" and "inexact" (trying out this terminology; feel free to suggest alternatives).
>
> Exact method references always refer to the same method, with the same arity and parameter/return types, independent of context.  To meet this standard, they must not be:
> - Overloaded (some other method exists in the same type with the same name)
> - Generic (declares type parameters)
> - Varargs
>
> (This design is informed by the fact that a high percentage of method declarations meet this standard.  The remaining cases might eventually be handled with some combination of i) stronger inference, and/or ii) explicit parameter type syntax.)
>
> (A method reference like Foo::m is a special case -- it could be an unbound instance method reference or a static method reference; but since there's only one method, we can determine the arity of this reference in a context-independent way by seeing whether the method is declared static or not.)
>
> Exact method references are analogous to explicit lambdas: the "meaning" of the expression is context-independent, even though its type (is it a Function? a Predicate?) is not.
>
> During overload resolution, exact method references can be used to disambiguate, like explicit lambdas.  Inference constraints can be produced from the referenced method's parameter and return types.  Most-specific logic that prefers ToIntFunction<Foo> over Function<Foo,Object> can be employed.
>
> Inexact method references behave like implicit lambdas.  The only thing checked during overload resolution is that they have the right "shape" (an arity check -- but note that, unlike implicit lambdas, an inexact method reference can support multiple arities, so we test whether any possible referenced declaration has an appropriate arity).  If an inexact method reference is passed as an argument to an overloaded method where multiple targeted functional interfaces have a compatible arity, an ambiguity generally occurs, unless some other invocation argument disambiguates.  The "meaning" of the invocation, like the meaning of a lambda body, remains unknown until a set of concrete parameter types can be provided by a target type.
>
> While we don't have the benefit of two different syntaxes to visually distinguish the two forms, the goal is for the same rules that apply to implicit/explicit lambdas to also apply to method references.  Hopefully this makes reasoning about overloading behavior in the presence of method references tractable for programmers.
>
> ---
>
> Conclusion: the discussion (and time to mull over and experiment with things) has not dissuaded us Oracle folks from thinking the proposed path is a good idea.  Nor do I sense substantial pushback from the EG, while at the same time this solves some of the hard complexity problems that the EG was uncomfortable with.  The prototype (in the OpenJDK Lambda repository) seems to be working.  We have a much cleaner story to tell users.  And, finally, this conservative path leaves us room to change our mind and add extra power in a later version, if needed.  So

Except at the mean time, some ugly method names are introduced in core
lib. The community is forced to do the same. These ugly method names
will live forever in Java.

> it looks like all signs point to adopting this plan.
>
> Please chime in if you feel like there's anything we've overlooked...
>
> —Dan
>
> [1] http://mail.openjdk.java.net/pipermail/lambda-spec-observers/2013-August/000422.html
> [2] http://mail.openjdk.java.net/pipermail/lambda-spec-observers/2013-August/000474.html
>
> On Aug 8, 2013, at 6:19 PM, Dan Smith <daniel.smith at oracle.com> wrote:
>
>> We spent some time at the EG meeting last week talking about the overload resolution story in the presence of lambdas/method references (and, why not, type argument inference).  There are a lot of tricky dependencies here, and the goal is to find a balance between expressivity and simplicity.
>>
>> The sense I got from the meeting is that, despite our efforts to refine the story (there have been a few iterations), we're still not there yet in terms of simplicity.  In particular, I think what's crucial about the model I presented is that users can identify the difference between implicit lambdas that get type checked pre-overload-resolution and post-overload-resolution; the sanity check I got is that nobody will be able to make that distinction.
>>
>> A couple of days later, Maurizio pointed out that, as we've iterated on our libraries, we've largely abandoned the space of programs that requires some of the more complex overload disambiguation machinery.  And looking more closely at those use cases, we agreed that we've probably been focusing too much on some atypical patterns.
>>
>> So, let me propose a greatly simplified but probably not-very-noticeably less expressive approach:
>>
>> Overload resolution will only check the arity of all implicit lambdas and will ignore overloaded method references.  If the body of a lambda is important for disambiguation, it must have explicit parameter types.
>>
>> Benefits of this approach:
>> - Very easy to understand -- it's mostly a syntactic distinction
>> - Consistent between all different patterns of overloading that were previously treated differently
>> - Facilitates a simple declaration-site warning check when method signatures conflict
>> - Encourages use of explicit lambdas -- clearly acknowledges that we can't solve all inference problems with implicit lambdas
>> - Avoids re-checking lambdas with different parameter types which means:
>> -- Typing of lambda bodies is easier for users to process
>> -- Implementations don't have to do speculative checking of arbitrary blocks of code
>> -- Bad theoretical complexity goes away
>>
>> We've thought about it for a few days and think this is a much better scenario for users and more in line with the EG's expectations (based on feedback both this year and last).
>>
>> Any questions/concerns?
>>
>> ---
>>
>> Here's an example of something we would stop disambiguating:
>>
>> interface I<T> {
>> <R> R map(Function<T,R> f);
>> int map(ToIntFunction<T> f);
>> long map(ToLongFunction<T> f);
>> double map(ToDoubleFunction<T> f);
>> }
>>
>> someIofString.map(s -> s.length());
>>
>> Declaration-site workaround: rename the methods.
>>
>> Use-site workaround: explicit parameter type:
>> someIofString.map((String s) -> s.length());
>>
>> ---
>>
>> Here's an example of something else we would stop disambiguating:
>>
>> static void m(Function<String,Integer> f);
>> static void m(ToIntFunction<Object> f);
>>
>> m(x -> x.length() > 10 ? 5 : 10);
>>
>> ---
>>
>> And here's something that we never could disambiguate in the first place (due to fundamental design constraints):
>>
>> interface Comparators {
>> <T, R extends Comparable<? super R>> Comparator<T> comparing(Function<T, R> f);
>> <T> Comparator<T> comparing(ToIntFunction<T> f);
>> }
>>
>> Comparator<String> cs = Comparators.comparing(s -> -s.length());
>>
>> ---
>>
>> —Dan
>