Record pattern inference & capture

Fri Mar 24 22:00:21 UTC 2023

The record pattern inference strategy introduced in 20 attempts to map a pattern's match type to a parameterization of a given generic record class (typically a subclass of the match type).

One of its capabilities is to interpret a "duplicated" type argument—as illustrated in the proposed JLS 18.5.5 text, for example, the type Function<Foo,Foo> can be mapped to a type like UnaryOperator<Foo>.

Another of its capabilities is an inference-based approach to wildcards, so that a match type like Function<? extends Foo, ? extends Bar> can map to something like UnaryOperator<? extends Foo & Bar>:

record Mapper<T>(T in, T out) implements UnaryOperator<T> {
    T apply(T arg) { return in.equals(arg) ? out : null; }
}

void test(Function<? super String, ? extends CharSequence> f) {
    if (f instanceof Mapper(var in, var out)) {
        boolean shorter = out.length() < in.length();
    }
}

Unfortunately, this strategy doesn't account for the fact that the match type should probably be captured before doing anything with it. (JLS is a little fuzzy in this when it comes to variable references, but in general every use/read of a variable or method needs to be captured before further typing occurs. Setting aside some poorly-specified cases, it's easy to come up with examples, like a method invocation (see JLS 15.12.3) where the wildcards definitively get captured.)

Nor does it account for the fact that capture variables may appear in the match type because they've flowed out of upstream expressions like method calls. (Think 'Map<?,?>.entrySet().iterator().next()'.)

If I try to match the type Function<CAP1,CAP2> with a type like UnaryOperator, should that be an inference failure? The current rules say "yes": inference variable alpha=CAP1, and alpha=CAP2, a contradiction. But I don't think that's right. If the dynamic check for UnaryOperator succeeds, that means that this must actually be a Function in which the actual types represented by CAP1 and CAP2 are the same.

My conclusion is that the inference treatment of wildcards ought to apply to capture variables, too. This is justified by the fact that while usually reasoning about capture variables works by assuming "there exists some type with these properties, don't assume anything else about it", in this case we also want to incorporate the fact that the dynamic pattern-matching check did, in fact, succeed. (However, I want to validate this thinking, because my confidence isn't 100%.) 

Here's how I think step #3 of the proposed JLS 18.5.5 should read:

-----

A type T' is derived from T, as follows:

-   If T is a parameterized type, let T_cap be the result of capture conversion (5.1.10) applied to T, and let Z1, ..., Zk (k ≥ 0) be the type variables produced by capture that are type arguments in T_cap. (This includes type variables produced by the capture conversion in this step, and type variables produced by capture conversion elsewhere.) Let β1, ..., βk (k ≥ 0) be inference variables, and let θ be the substitution [Z1:=β1, ..., Zk:=βk]. T' is T_cap θ.

    Additional bounds for β1, ..., βk are incorporated into B0 to form a bound set B1, as follows:

    - If βi (1 ≤ i ≤ k) replaced a type variable with an upper bound U, then the bound βi <: U θ appears in the bound set

    - If βi (1 ≤ i ≤ k) replaced a type variable with a lower bound L, then the bound L θ <: βi appears in the bound set

    - If no proper upper bounds otherwise exist for βi (1 ≤ i ≤ k), the bound βi <: Object appears in the bound set

- If T is any other class or interface type, then T' is the same as T, and B1 is the same as B0.

- If T is a type variable or an intersection type, then for each upper bound of the type variable or element of the intersection type, this step and step 4 are repeated recursively. All bounds produced in steps 3 and 4 are incorporated into a single bound set.