Type inference across method chains

Fri Mar 17 20:19:52 UTC 2017

Its a pretty big overstatement to say "type inference doesn't work 
across method chaining."  As is inevitable with type inference, there 
will be limits, and people will always complain about the marginal 
example, and say "but this example could be accommodated too, with a 
more complex inference algorithm."  And that's usually correct, but then 
there will be new cases at the margin, about which people will 
complain.  So let's not fool ourselves that this is "the last case" -- 
it's just "the next case."

There's nothing magic about type inference; it's just constraint 
solving.  The choices one has in designing a type inference algorithm 
amount to: what constraints to use, and over what scope to solve.

There are also a number of hidden trade-offs besides complexity. For 
example, the larger the scope over which constraints are solved, the 
more likely error messages will be something like "there was an error 
somewhere in Foo.java -- which users find unhelpful.

So, with that, let's look at your example.  The cases which make people 
complain "type inference doesn't work under chaining" invariably reduce 
to the case where there is a generic method used as a receiver for a 
chained invocation, and whose generic type parameters cannot be solved 
directly from its arguments, but instead would need type information 
that would have to be back-propagated from later links in the chain 
(which generally further require information from the target type.)

As you correctly point out, there are simple workarounds, such as using 
explicit lambdas or exact method refs (test2), breaking up the chain 
(test3), or providing explicit type witnesses (not shown.)

> Type inference across chained calls in Flink seems to work sometimes, 
> but not always, which is frustrating.

"sometimes but not always" is a fact of life with type inference; let's 
not pretend that "always" is a place we could get to.  It's just a 
question of where to draw the line, and accept that there will always be 
something just over the line.

> Method chaining really is getting much more common in Java, with the 
> advent of APIs like Java 8 streaming, Apache Spark and Apache Flink.
>

You say this like you think we're not aware of this, or indeed that 
enabling broader use of chaining in APIs wasn't, like, one of the major 
goals of the work we did in Java 8 on type inference!  The reality is: 
there will always be limitations, and they will always be annoying when 
you encounter them.

>
> Based on the discussion in the thread above, it sounds like the reason 
> this hasn't been done yet is due to the complexity of the current 
> compiler design, not because it's not actually possible to do this?
>

It is indeed not impossible.  But, as you say, it adds additional 
complexity to a type inference algorithm that has already grown quite 
complex (and which now interacts heavily with another complex mechanism, 
overload selection).  Could we take one more step, and move the corner 
cases further out to the margins, at the cost of greater complexity (and 
perhaps decreased usability)?  Yes, we could.  In Java 8, we chose to 
leave it here, which I think was a sensible choice.

We might someday come back to look at this -- but right now, it seems 
like this is not the highest-return use of limited and highly 
specialized resources.

On 3/16/2017 10:55 PM, Luke Hutchison wrote:
> I have been trying to figure out why type inference doesn't work 
> across method chaining (see the code example below). I finally came 
> across these old messages on lambda-dev:
>
> On Tue Jul 16 15:08:05 PDT 2013, Dan Smith wrote:
> > Without ruling out the possibility of enhancements that address 
> situations like this, [...] the status quo is that when you type a 
> dot, the compiler has to completely type-check the stuff to the left 
> before it can proceed.  Inference can do a lot of cool tricks with 
> context, but will not be influenced by "context" that occurs after the 
> dot.
> [ Archive link: 
> http://mail.openjdk.java.net/pipermail/lambda-dev/2013-July/010544.html ]
>
> On Wed Jul 24 12:41:03 PDT 2013, John Rose wrote:
> > The reason I'm wildly waving this flag is that (I think) I have seen 
> this phenomenon active during the design of streams APIs, ongoing now.
> [ Archive link: 
> http://mail.openjdk.java.net/pipermail/lambda-dev/2013-July/010629.html ]
>
> Method chaining really is getting much more common in Java, with the 
> advent of APIs like Java 8 streaming, Apache Spark and Apache Flink.
>
> Type inference across chained calls in Flink seems to work sometimes, 
> but not always, which is frustrating. You end up having to add type 
> declarations in lots of places that shouldn't need them, or breaking 
> method chains into separate declarations so that the compiler can 
> understand the types.
>
> Based on Dan Smith's comments in the above thread, it seems that it 
> was decided that the numerous complex steps involved in receiver 
> method resolution had to all be completed before parameter type 
> inference. However, it seems that if the entire graph of all possible 
> method signatures and all type constraints were built before any 
> inference were performed, then all of these steps could be performed 
> simultaneously and iteratively until convergence: in other words, all 
> the types and methods in a method call chain could be solved as a 
> single constraint satisfaction problem. If in the process of solving 
> this constraint satisfaction problem, it was discovered that there is 
> no single satisfying assignment of types to type variables, and/or if 
> this did not result in a single method definition being selected per 
> method name, then a type error would be reported.
>
> Based on the discussion in the thread above, it sounds like the reason 
> this hasn't been done yet is due to the complexity of the current 
> compiler design, not because it's not actually possible to do this?
>
> Some example code: the method test1() below really should typecheck 
> OK, but gives an error on (l + 1) in Java 8. Providing types for the 
> lambda params in test2() allows the types to propagate one method call 
> further along the chain, but it is frustrating to always have to do 
> that. Breaking up the chaining into two declarations in test3() allows 
> the typechecking to work for both operations, but this defeats the 
> purpose.
>
>
>     // Compile with these Maven dependencies:
>     // https://flink.apache.org/downloads.html#maven-dependencies
>
>     package com.rentlogic.buildingscores.flink;
>
>     import org.apache.flink.api.java.DataSet;
>     import org.apache.flink.api.java.ExecutionEnvironment;
>     import org.apache.flink.api.java.tuple.Tuple2;
>     import org.apache.flink.util.Collector;
>
>     public class TestMethodChainInference {
>         private static ExecutionEnvironment env =
>     ExecutionEnvironment.getExecutionEnvironment();
>
>         private static DataSet<String> strs = env.fromElements("a",
>     "xy", "pqr");
>
>       // Type of l resolved as Object, so (l + 1) gives a type error
>         public static void test1(String[] args) {
>     DataSet<Integer> strLenPlusOne = strs
>     .flatMap((s, out) -> out.collect(s.length()))
>     .flatMap((l, out) -> out.collect(l + 1));     // ERROR
>         }
>
>     // No error
>         public static void test2(String[] args) {
>     DataSet<Integer> strLenPlusOne = strs
>     .flatMap((String s, Collector<Integer> out)-> out.collect(s.length()))
>     .flatMap((l, out) -> out.collect(l + 1));
>         }
>
>     // No error
>         public static void test3(String[] args) {
>     DataSet<Integer> strLen = strs
>               .flatMap((s, out) -> out.collect(s.length()));
>     DataSet<Integer> strLenPlusOne = strLen
>               .flatMap((l, out) -> out.collect(l + 1));
>         }
>     }
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20170317/4baab0c6/attachment-0001.html>