Inference Bug related to parametric polymorphism unification
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Jun 12 12:59:43 UTC 2019
I took a look at the complete example, and simplified it to this:
import java.util.*;
import java.util.function.*;
class Example {
public static <FOLD_A, FOLD_B> FOLD_B foldLeft(BiFunction<FOLD_B,
FOLD_A, FOLD_B> fn, FOLD_B b, Iterable<FOLD_A> as) {
throw new UnsupportedOperationException();
}
public static <FLAT_A> Iterable<FLAT_A>
flatten(Iterable<Iterable<FLAT_A>> aas) {
throw new UnsupportedOperationException();
}
public static <MAP_A, MAP_B> Iterable<MAP_B> map(Function<MAP_A,
MAP_B> fn, Iterable<MAP_A> as) {
throw new UnsupportedOperationException();
}
public static <DEST_A, DEST_B, DEST_C> Function<Map.Entry<DEST_A,
DEST_B>, DEST_C> destructure(BiFunction<DEST_A, DEST_B, DEST_C> fn) {
throw new UnsupportedOperationException();
}
public static void main(String[] args) {
Map<String, List<String>> headers = Collections.emptyMap();
// type-checks in 8, fails in 11/12
foldLeft(
(acc, h) -> acc,
42,
flatten(
map(
destructure((name, values) ->
(Iterable<Map.Entry<String, String>>)null),
headers.entrySet()
)
)
);
}
}
(I've renamed all type variables, for clarity).
I believe the issue is the same as last time - there is a "race" between
two implicit lambdas whose parameter type-variables have to be resolved
eagerly:
1) (acc, h) -> acc (input variables are FOLD_A and FOLD_B, respectively
- output var is FOLD_B)
2) (name, values) -> ... (input variables are DEST_A and DEST_B
respectively - output var is DEST_C)
Now, of course we'd like for input vars in (1) to be solved _after_
those in (2). But to do this we have to prove there's a dependency
between the output var of (2) [DEST_C] and one of the input variables of
(1) [FOLD_A, FOLD_B].
We can clearly see that there's no dependency between DEST_C and FOLD_B.
They are completely unrelated. But what about FOLD_A? Things get tricky
here - we have the following constraints:
FLAT_A = FOLD_A
MAP_B = Iterable<FLAT_A>
DEST_C = MAP_B
DEST_A = String
DEST_B = List<Integer>
So, DEST_C can influence MAP_B, and MAP_B. But, unfortunately, the chain
stops here. That's because it's MAP_B that depends on FLAT_A, not the
other way around. The inference engine sees that MAP_B has a bound of
Iterable<FLAT_A>, so it knows it must instantiate FLAT_A first to get at
MAP_B.
In other words, like last time, we have a case where there's no possible
ordering (according to the spec) in which type variables must be solved
eagerly. In other words, we're in this case (from JLS 18.5.2):
"A subset of constraints is selected in C, satisfying the property that,
for each constraint, no input variable can influence an output variable
of another constraint in C. The terms /input variable/ and /output
variable/ are defined below. An inference variable α /can influence/ an
inference variable β if α depends on the resolution of β (§18.4
<https://docs.oracle.com/javase/specs/jls/se12/html/jls-18.html#jls-18.4>),
or vice versa; or if there exists a third inference variable γ such that
α can influence γ and γ can influence β."
Since the constraints for (1) and (2) are independent, see above, we
must solve all of them at the same time:
"The selected constraint(s) are removed from C. [...] The input
variables α1, ..., αm of all the selected constraint(s) are resolved."
And this means that the eager instantiation you get will not take the
body of the lambda in (1) into account (since that's not looked at
during this process). Which means:
FLAT_A = FOLD_A = Object
MAP_B = Iterable<Object>
DEST_C = Iterable<Object>
DEST_A = String
DEST_B = List<Integer>
Which doesn't give the desired answer.
So, I believe the compiler is correct in rejecting the example.
I also notice that the compiler doesn't really implement the spec to the
letter here - that is, the spec mandates that _all_ the variables of all
constraints in C must be solved at once; whereas javac choose an
implementation specific order between the constraints in C, and then
solves each of the constraint input variables independently. This can
lead to issues like the one you experienced where, if this unspecified
order changes from one release to the next (and we have been making
changes in this area that could have affected this), you might get
'spurious' successes, because, by accident, the compiler managed to
order constraints in the right way.
But before we fix the compiler to be even stricter - I'd like to
understand if we can't instead, in case of ties like these, improve the
spec so that we force resolution of constraints of the kind <Expression
-> T> where 'Expression' is the 'most nested' one between the other
competing constraints (and then left to right, in case of depth-ties). I
think this would give the behavior we want, and it would be intuitive,
as developers won't be surprised, I think, to find out that variables
are flowing from inner contexts outwards.
I note for example that, if cycles are found in the set of constraint C,
the spec goes to some length in trying to establish an order in which
constraints should be picked inside C. But there's no similar treatment
for when multiple constraints in C can go ahead at the same time - and
that is unfortunate, because, like in this case, the fact that we cannot
establish a direct chain of dependency between DEST_C and FLAT_A doesn't
mean the two variables are completely unrelated.
So, my question to Dan stands :-)
Maurizio
On 10/06/2019 16:19, John Napier wrote:
> Hey Maurizio,
>
> My apologies; you’re right that the example doesn’t compile against 8
> - in my attempts to shrink the code that I have failing locally to the
> minimal test case, I apparently lost something essential in the
> translation.
>
> In the interest of avoiding further confusion, here is the full,
> complicated example that compiles against 1.8 but fails to compile
> without explicit type ascriptions in both 11 and 12:
>
> public static class Example {
>
> public static <A, B> Iterable<B> map(Function<? super A, ?
> extends B> fn, Iterable<A> as) {
> throw new UnsupportedOperationException();
> }
>
> public static <A> Iterable<A> flatten(Iterable<Iterable<A>> aas) {
> throw new UnsupportedOperationException();
> }
>
> public static <A, B> B foldLeft(BiFunction<? super B, ? super
> A, ? extends B> fn, B b, Iterable<A> as) {
> throw new UnsupportedOperationException();
> }
>
> public static <A, B, C> C destructure(BiFunction<? super A, ?
> super B, ? extends C> fn,
> Map.Entry<A, B> entry) {
> throw new UnsupportedOperationException();
> }
>
> public static <A, B, C> Function<Map.Entry<A, B>, C> destructure(
> BiFunction<? super A, ? super B, ? extends C> fn) {
> throw new UnsupportedOperationException();
> }
>
> public static <K, V> Function<V, Map.Entry<K, V>> pair(K fn) {
> throw new UnsupportedOperationException();
> }
>
> public static final class Sink {
>
> public Sink add(String k, String v) {
> throw new UnsupportedOperationException();
> }
> }
>
> public static void main(String[] args) {
> Map<String, List<String>> headers = Collections.emptyMap();
>
> // type-checks in 8, fails in 11/12
> foldLeft((acc, h) -> destructure(acc::add, h),
> new Sink(),
> flatten(map(destructure((name, values) ->
> map(pair(name), values)), headers.entrySet())));
> }
> }
>
> Again, so sorry for the confusion.
>
> Same question as before: should *that* code continue to compile, or is
> it correct that it fails to compile in 11 / 12?
>
> Cheers,
> John
>> On Jun 10, 2019, at 6:33 AM, Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>> Hi John
>> I've been trying to reproduce, but for me the example fails even in 8...
>>
>> That said, I agree that there could be something fishy here; I did
>> some experiments with variations of your example, and it looks like
>> javac can undrestand what the typing of `destructure(Pair::new,
>> entry)` is, in isolation - you can see that by adding this statement:
>>
>> Object o = (String)destructure(Pair::new, entry);
>>
>> This will trigger this error:
>>
>> error: incompatible types: Pair<String,String> cannot be converted to
>> String
>> Object o = (String)destructure(Pair::new, entry);
>>
>> Which seems to confirm that javac is indeed capable of inferring A
>> and B to String and take it from there.
>>
>> My suspicion is that when you nest these expressions inside each
>> other, you create a 'bigger' inference context, where two expressions
>> cannot be evaluated (we call them 'stuck'):
>>
>> * Pair::new (indirect method reference)
>> * (x, y) -> x + y (implicit lambda)
>>
>> To break this tie, the compiler has to choose which inference
>> variable to instantiate first: the outer { A, B }, or the innermost
>> ones? This choice is, at its core, driven by the spec 18.5.2 (see
>> [1]). In your example we have these inference variables:
>>
>> * A#inner, B#inner and C#inner (from innermost invocation of
>> 'destructure')
>> * A#outer, B#outer and C#outer (from outermost invocation of
>> 'destructure')
>>
>> This analysis is based on the concept of input/output inference
>> variable - in this case:
>>
>> * A#inner,B#inner influence C#inner
>> * A#outer,B#outer influence C#outer
>>
>> So far, if we had to choose between these two constraints, the choice
>> would be non-deterministic (as the two input/output constraints are
>> mirroring each other). But there's another dependency here:
>>
>> * C#inner <: Entry<A#outer, B#outer>
>>
>> Now, from a programmer perspective, looking at the way the code is
>> written, it's clear we'd like solution of C#inner to influence
>> A#outer, B#outer. But I don't think that's the way the spec works. In
>> this case the spec says that C#inner 'depends' on A#outer, C#outer.
>>
>> So we really have no way
>>
>> That's because the result of the innermost call to 'destructure' is
>> passed to the outermost call, so the inferred type for C#inner can
>> affect the inferred Entry<A#outer, B#outer>.
>>
>> So, we are basically left with what I think is an non-deterministic
>> choice between eagerly resolving { A#inner, B#inner } or { A#outer,
>> B#outer }. Unfortunately, picking the former makes the code pass,
>> whereas picking the latter makes compilation fail (because at that
>> time we don't know any better than just instantiating A#outer and
>> B#outer to Object).
>>
>> If this is the issue, inference could to a better job, by either
>> looking at presence of other 'more meaningful' bounds on {A#inner,
>> B#inner} (although defining 'more meaningful might be problematic).
>> Or it could just give precedence to innermost variables in case of a
>> tie, which is a rule that I think programmers will find natural.
>> Currently though, these 'extensions' are not part of the JLS - so the
>> behavior here is, essentially, unspecified.
>>
>> At the same time, there's also an argument to be had as to whether
>> the code here isn't inherently buggy - essentially, { A, B, C } in
>> the 'destructure' method are all independent variables so, in a way,
>> the problem is *underconstrained* (e.g. there's no real objective
>> choice for which variable to solve first, other than resorting to
>> heuristics, whose mileage can vary).
>>
>> Note that your second call works simply because, by adding the
>> explicit type param on the method reference receiver, you make the
>> method reference 'direct' meaning its type is evaluated *ahead* of
>> the lambda. Basically you told the compiler which of the two 'stuck'
>> expression should be solved first.
>>
>> @Dan - is this something that has been discussed before in spec-land?
>>
>> Cheers
>> Maurizio
>>
>> [1] -
>> https://docs.oracle.com/javase/specs/jls/se12/html/jls-18.html#jls-18.5.2.2
>>
>> On 06/06/2019 20:31, John Napier wrote:
>>> Hi all,
>>>
>>> The following code compiles against oracle64-1.8.0.162 but fails on
>>> both openjdk64-11.0.2 and oracle64-12.0.1:
>>>
>>> public static class Example {
>>>
>>> public static class Pair<A, B> implements Map.Entry<A, B> {
>>> private final A a;
>>> private final B b;
>>>
>>> public Pair(A a, B b) {
>>> this.a = a;
>>> this.b = b;
>>> }
>>>
>>> @Override
>>> public A getKey() {
>>> return a;
>>> }
>>>
>>> @Override
>>> public B getValue() {
>>> return b;
>>> }
>>>
>>> @Override
>>> public B setValue(B value) {
>>> throw new UnsupportedOperationException();
>>> }
>>> }
>>>
>>> public static <A, B, C> C destructure(BiFunction<A, B, C>
>>> fn, Map.Entry<A, B> entry) {
>>> return fn.apply(entry.getKey(), entry.getValue());
>>> }
>>>
>>> public static void main(String[] args) {
>>> Map.Entry<String, String> entry = new Pair<>("foo", "bar");
>>>
>>> // ill-typed
>>> String inferred = destructure((x, y) -> x + y,
>>> destructure(Pair::new, entry));
>>>
>>> // well-typed
>>> String ascribed = destructure((x, y) -> x + y,
>>> destructure(Pair<String, String>::new, entry));
>>> }
>>> }
>>>
>>> Before submitting more bugs of this same ilk, albeit with slight
>>> variations in occurrence, I must ask: this is not the intentional
>>> modern behavior, correct? Inference should still work as illustrated
>>> above, right?
>>>
>>> Cheers,
>>>
>>> -John
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20190612/f6079b0d/attachment-0001.html>
More information about the compiler-dev
mailing list