Collected comments / questions re type inference in 0.6.3

Wed Oct 16 09:05:39 PDT 2013

Thanks for all the feedback.  It is very helpful.

On Oct 13, 2013, at 9:19 AM, Stephan Herrmann <stephan.herrmann at berlin.de> wrote:

> I finally found a contiguous time slot for updating my inference
> implementation according to spec version 0.6.3.
> These are my comments and questions:
> 
> 
> I couldn't find a statement on how type inference supports
> "pass-through" varargs semantics, where a provided array
> is directly bound to the varargs array variable.
> This way Java 8 would change the semantics in cases like this:
>       String[][] x = {{"X"}, {"Y"}};
>       List l = Arrays.asList(x);
> In Java 7 'l' has two elements of type String[] but the new
> inference seems to produce one element of type String[][].
> Is this intended?

In fact, I tried to add text to support the intended behavior (a list of length 2, as you say), where it was unclear before.

See the discussion in Part F, 15.12.2.4: "The previous 'applicable variable arity method' terminology incorrectly hinted that, if a variable-arity method is applicable in any phase, it is applicable in and only in Phase 3. This overlooks the fact that variable arity methods can act as fixed-arity methods in Phases 1 and 2. What is relevant is the kinds of adaptations actually used to determine applicability, not the kinds of adaptations allowed by the method declaration."

In JLS 7, 15.12.2.2 and 15.12.2.3 assume (incorrectly) that any potentially-applicable method, m, will have exactly n formal parameters, where n is the number of argument expressions.  If we take this to be a restriction ("let m be a potentially-applicable method with n formal parameters"), then 'asList' is applicable by subtyping.  15.12.2.4, had we gotten there, would have tested that String[][] <: Object ("can be converted by method invocation conversion to the component type of Sn"), which it is, and we would have ended up with a 1-element array.  The actual runtime behavior comes from 15.12.4.2, which only allocates a new array if "the type of the k'th argument expression is not assignment compatible with T[]".

In the Lambda Spec, none of this is substantially changed.  We should probably clarify that 15.12.2.2 and 15.12.2.3 apply to all methods for which there are n formal parameters.

> Incorporation of captures bounds (18.3) is unclear in the use
> of indices of α and B. In the former case I believe this is
> just a matter of lax notation (replacing all α by αi seems to
> trivially mend the issue).

Yes, someone pointed this out to me already and it's fixed.

> However, for the case of B I'm not
> sure about the intention: B1,...,Bn are introduced as the
> bounds of type parameters P1,...,Pn, i.e., each Bi is a list of
> bounds. The first bullet splits these into Bij, which is fine.
> But what am I to make of a constraint ⟨θ Bi <: R⟩?
> A list of types cannot be a subtype of a type, can it?
> Is s.t. like "foreach j : Bij" implied here?

Yes, it's a slight abuse, but I didn't invent it. :-)  The intent is that Bi is an intersection type as implied by the bound.  See, for example, 4.5, which also treats Bi as a type.

Note that an intersection types do _not_ have "for all" behavior on the left side of subtyping: the constraint is that "there exists" an element of the list that is a subtype of R.

If R were allowed to be a type variable, I would be concerned about ⟨θ Bi <: R⟩ and want to break it into Bijs.  But, since subtyping reduction (18.2.3) will go ahead and do the right thing with an intersection anyway, I think I'm okay with it.  (Generally, we do need to tighten up or eliminate the distinction between intersections and lists of types, but not today.)

> BTW, the notation for substitution is not consistent, e.g.:
> - 18.3: θ Bi
> - 18.4: L1θ
> Is θ prefix or postfix? :)

Thanks for noting that.  I'm trying to follow the precedent from sections like 4.10.2; it should be postfix.

> Sect. 18.4. requires some type variables to "have well-formed bounds".
> I could find no reference to well-formedness rules for type variables
> in the given sense.

I'll make a note to clean this up as best I can.  The idea is that if capture would allow it (5.1.10), it's okay.  One thing capture prohibits is two different unrelated classes as upper bounds.  One thing it doesn't prohibit, but should, is a lower bound that is not a subtype of any upper bound.  (Another to-do for another day: clarify the well-formedness rules for types.)

> I tend to say that this is a consequence of loose terminology in
> the classification of types: On the one hand great care is taken
> to distinguish "proper types" from inference variables (good!),
> but detailed classification of proper types is still blurred.
> First it might help to link usage of the term "type argument"
> (notably in 18.2.3 second half, 18.2.4. second half) to 4.5.1
> in order to remind the reader, that
>  "Type arguments may be either reference types or wildcards."
> Otherwise the condition "If T is a type" (should be "reference type")
> looks confusing.

Good suggestion.  Done.

> Secondly, usage of the term "type variable" conflicts with 4.4.
> I found this conflict to be in the tradition of 5.1.10, but it
> seems that 0.6.3 aggravates the situation, because in places like
> 18.4 ("Let Z1, ..., Zn be fresh type variables") we now speak of
> type variables with a lower bound and *arbitrary* upper bounds.
> Can a type variable have both upper and lower bounds?
> Can a type variable have multiple bounds that are classes?
> Can a type variable have multiple bounds that are arrays?
> To me it is not clear what rules would apply to these type
> variables in terms of well-formedness and in terms of compatibilty/
> equivalence/subtyping/erasure, but it seems that these type
> variables may survive inference (in contrast to inference variables)
> and thus must be handled by all downstream phases of compilation,
> right?

Yes, this is messy.

4.4 does a surprisingly good job of referring to a "type variable declared as a type parameter".  Capture variables (including the variables produced by 18.4) are type variables that are _not_ declared as type parameters.

5.1.10 makes a weak attempt to provide some well-formedness rules, but, as I said, we need to do a better job here.

Note that pretty much any type can already be the upper or lower bound of a capture variable, due to inference:
<T> List<? extends T> m(...); // any type that can be inferred for T is an upper bound of the capture of the result

Also note that the idea of using "capture" variables for inference is not new -- see JLS 7 15.12.2.8.

Despite these qualifiers, yes, there are some new possibilities for variable bounds that were not possible before.

One thing that needs tweaking (a known issue) is the use of lub in 18.4.  The inputs to lub might not be fully defined types -- they might involve type variables whose bounds we are in the middle of instantiating.  So I need to back off a bit on what we can do there, probably only taking the lub of the proper lower bounds.  (glb is more of a syntactic operation, so we can get away with it in 5.1.10 and here.)

> I don't see how a type variable in this sense falls into the
> existing classification, it seems to be neither of type,
> inference variable, type argument, type variable à la 4.5.1.
> That would mean none of the existing rules are applicable to
> type variables?
> OTOH, from some uses of "type variable" I could _infer_ that these
> are indeed considered as types. But why a type variable should be
> a type and a wildcard should not is obscure to me.
> Given a suitable interpretation this may all be very clean,
> but currently it seems in order to understand type inference
> you first have to apply type meta inference :)

Type variables are definitely types.  A type variable produced by capture may, for example, be the type of an expression.  Chapter 4 needs to do a better job of acknowledging that some types (capture variables, intersection types, the 'null' type) are not Types -- they can't be written down in the syntax -- but are still types.

Wildcards are not types because they make no sense except in the context of a parameterized type.  It is impossible for a wildcard to be the type of an expression.

> 18.2.1 calls for applying 18.5.2 to method invocations *and*
> class instance creations. Inside 18.5.2, however, I don't see
> proper handling of constructors. Specifically:
>  "let R be the return type of m"
> Given that constructors have no return type
> (8.8: "... the constructor declaration looks just like a method
>       declaration that has no result (§8.4.5)."),
> I believe that inference only works as desired, if we extend
> this to include the type of the class instance creation.
> Else the rule "If R is void, then a compile-time error occurs"
> would cause trouble for constructors.

See JLS 7 15.9.3.  Overload resolution and inference have always been defined for class instance creations somewhat informally in terms of the behavior for methods.  In the diamond case, we explicitly define a "method" that gets plugged in to the analysis.

I've added a sentence to clarify that the "method" is as defined in 15.9.3.

> Repeating from a previous mail: integration of unchecked
> conversions into 18.5.1 and 18.5.2 is unclear.

Yes, on my to-do list.

> From Dan's last statement it seems that Java 8 will be significantly
> more restrictive, rejecting some programs legal in Java 7.
> Before implementing such semantics I need confirmation by the EG,
> given that javac (b109) does *not* enforce the stricter semantics.
> I might add that I'm actually in favor of gradually fading out some uses
> of unchecked conversions, but this must happen in a coordinated way.

What we're seeing is an old inconsistency that just hasn't been fixed yet.  There are many other bugs, some reported and some not yet identified, of a similar scope.  And Eclipse often prefers consistency with javac over consistency with the spec.

There may be subtle differences between 7 and 8 due to 8's commitment to soundness; this is by design.  I haven't encountered any cases that seem to be of serious concern to users.  (In other words, 7 defaults to "true" during reduction, while 8 prefers "false".  So in some corners, 7 is more powerful by pretending something is true even though it can't prove it, and then happening to get lucky.  These corners are obscure.)

—Dan