[lvti] Handling of capture variables

Fri Mar 31 23:39:00 UTC 2017

As described in the JSR 286 spec document, inferring the type of a local variable to be a non-denotable type (one that can't be written in source) is something to be careful about, due to "potential for confusion, bad error messages, or added exposure to bugs".

The most significant area here (in terms of likely frequency) is the presence of capture variables in the type. I did some analysis of the Java SE APIs to identify and illustrate problematic cases.

== Case 1: wildcard-parameterized return type ==

Any method (or field) that returns a wildcard-parameterized type will produce a non-denotable type on invocation, because the return type must be captured (JLS 15.12.3).

var myClass = getClass();
var c = Class.forName("java.lang.Object");
var sup = String.class.getSuperclass();
var entries = new ZipFile("/etc/filename.zip").entries();
var joiner = Collectors.joining(" - \n", "<start>", "<end>");
var plusCollector = Collectors.reducing(BigInteger.ZERO, BigInteger::add);
var future = Executors.newCachedThreadPool().submit(System::gc);
void m(MethodType type) { var ret = type.returnType(); }
void m(TreeSet<String> set) { var comparator = set.comparator(); }
void m(Annotation ann) { var annClass = ann.annotationType(); }
void m(ReferenceQueue<String> queue) { var stringRef = queue.poll(); }

Using wildcards in a return type is sometimes discouraged, but other times it's the right thing to do.  So while I wouldn't say these methods are pervasive, there are quite a few of them (especially where the common idiom is to almost always use a wildcard, as in Class and Collector).

There are no capture variables present for methods that return arrays, lists, etc., of wildcard-parameterized types, because capture doesn't touch those nested wildcards:

void m(MethodType type) { var params = type.parameterArray(); }
void m(MethodType type) { var params = type.parameterList(); }

== Case 2: instance method returning a class type parameter ==

A method (or field) whose return type is a class type parameter will produce a capture variable when invoked for a wildcard-parameterized type.

void m(Class<? extends Runnable> c) throws Exception { var runnable = c.newInstance(); }
void m(Map<String, ? extends Throwable> map) { var e = map.get("some.key"); }
void m(List<? extends Set<String>> sets) { var first = sets.get(0); }
Object find(Collection<?> coll, Object o) { for (var elt : coll) { if (elt.equals(o)) return elt; } return null; }
void m(Optional<? extends Number> opt) { var num = opt.get(); }
void m(IntFunction<? extends Reader> f) { var reader = f.apply(14); }
void m(Future<? extends ZipEntry> future) { var entry = future.get(10, TimeUnit.SECONDS); }

If you substitute a wildcard-parameterized type into the return type, that also leads to capture:

void m(List<Set<? extends Number>> list) { var set = list.get(0); }

This is true for for-each, too (for now, javac fails to perform capture correctly, so you don't see this in the prototype):

void m(List<Set<? extends Number>> list) { for (var set : list) set.clear(); }

== Method category 3: instance method returning a type that mentions a class type parameter ==

A method (or field) whose return type *mentions* a class type parameter (e.g., Iterator<E> in Iterable.iterator) will also produce a non-denotable type when invoked for a wildcard-parameterized type.  Unlike Category 2, which tend to be "terminal operations", these types often arise in chains.

== Case 4: method with inferred type parameter in return type ==

A method (or constructor) whose return type includes an inferred type parameter may end up substituting capture variables or other non-denotable types.  This typically depends on the types of the arguments, again with a wildcard-parameterized type showing up somewhere.

void m(Enumeration<? extends Runnable> tasks) { var list = Collections.list(tasks); }
void m(Set<?> set) { var syncSet = Collections.synchronizedSet(set); }
void m(Function<? super String, ? extends Throwable> f) { var es = Stream.of("a", "b", "c").map(f); }

There are also cases here that are specified to produce capture vars but do not in javac:

void m(List<? extends Number> ns) { var firstSet = Collections.singleton(ns.get(0)); }

----------------

With that in mind, looking at our three options for dealing with capture variables:
1) Allow the non-denotable type
2) Map the type to a supertype that is denotable
3) Report an error

(3) isn't viable. "You can't use 'var' with 'getClass'" is already pretty bad. Prohibiting all the uses above would be really bad.

We've thought a lot about (1) and (2). The JEP includes this example:

void test(List<?> l1, List<?> l2) {
    var l3 = l1; // List<CAP> or List<?>?
    l3 = l2; // error?
    l3.add(l3.get(0)); // error?
}

On 'l3 = l2': I wouldn't say it's an important priority that all 'var' variables have a type that is convenient for future mutation. But we do expect users do be able to easily see *why* an assignment wouldn't be allowed. Unfortunately, capture variables are such a subtle thing that they're often invisible, and programmers don't even realize that they appear as an intermediate step. So, most people would see 'var l3 = l1' and expect that the type of l3 is List<?>.

On 'l3.add(l3.get(0))': This is a cool trick. The use of 'var' essentially serves the same purpose as invoking a generic method in order to give a capture variable a name:

<T> dupFirst(List<T> list) { list.add(list.get(0)); }
...
dupFirst(l1);

On the other hand, it's a subtle trick, and the average user isn't going to understand what's going on. (Or, more likely: 'l3.add(l3.get(0))' looks fine to them, but they won't understand why it stops working when that gets refactored to 'l1.add(l1.get(0))'.)

So, in terms of user experience, it seems like (2) is the desired outcome here.

That choice isn't without some sacrifice: it would be a nice property if lifting a subexpression out of an expression into its own 'var' declaration yields identical types. Since (2) changes the intermediate type, that doesn't hold. That said, hopefully our mapping function is reasonably unobtrusive...

How do we define the mapping? "Use the bound" is the easy answer, although in practice it's more complicated than that:
- Which bound? (upper or lower?)
- What if the bound contains the capture var?
- What do you do with a capture variable appearing as a (invariant) type argument?
- What do you do with a capture variable appearing as a wildcard bound?

We're working on finalizing the details. While this operation isn't trivial, it turns out it's pretty important: we already need it to solve bugs in the type system involving type inference [1] and lambda expressions [2]. It's a useful general-purpose tool.

—Dan

[1] https://bugs.openjdk.java.net/browse/JDK-8016196
[2] https://bugs.openjdk.java.net/browse/JDK-8170887