Patterns and GADTs (and type checking and inference and overload selection)
Brian Goetz
brian.goetz at oracle.com
Wed Mar 30 19:33:21 UTC 2022
GADTs -- sealed families whose permitted subtypes specialize the type
variables of the base class -- pose some interesting challenges for
pattern matching.
(Remi: this is a big, complex area. Off-the-cuff "this is wrong" or
"you should X instead" replies are not helpful. If in doubt, ask
questions. One comprehensive reply is more useful than many small
replies. Probably best to think about the whole thing for some time
before responding.)
Here is an example of a GADT hiearchy:
sealed interface Node<T> { }
record IntNode(int i) implements Node<Integer> { }
record BoolNode(boolean b) implements Node<Boolean> { }
record PlusNode(Node<Integer> a, Node<Integer> b) implements
Node<Integer> { }
record OrNode(Node<Boolean> a, Node<Boolean> b) implements Node<Boolean> { }
record IfNode<T>(Node<Boolean> cond, Node<T> a, Node<T> b) implements
Node<T> { }
Nodes can be parameterized, but some nodes are sharply typed, and some
intermediate nodes (plus, or, if) have constraints on their components.
This is enough to model expressions like:
let
a = true, b = false, x = 1, y = 2
in
if (a || b) then a + b else b;
Note that `if` nodes can work on both Node<Integer> and Node<Boolean>,
and model a node of the right type.
## The Flow Problem
As mentioned earlier, pattern matching can recover constraints on type
variables, but currently we do not act on these. For example, we might
want to write the eval() like this:
static<T> T eval(Node<T> n) {
return switch (n) {
case IntNode(var i) -> i;
case BoolNode(var b) -> b;
case PlusNode(var a, var b) -> eval(a) + eval(b);
case OrNode(var a, var b) -> eval(a) || eval(b);
case IfNode<T>(var c, var a, var b) -> eval(c) ? eval(a) : eval(b);
};
But this doesn't work. The eval() method returns a T. In the first
case, we've matched Node<T> to IntNode, so the compiler knows `i :
int`. Humans know that T can only be Integer, but the compiler doesn't
know that yet. As a result, the choice to return `i` will cause a type
error; the compiler has no reason to believe that `i` is a `T`. The
only choice the user has is an unchecked cast to `T`. This isn't great.
We've discussed, as a possible solution, flow typing for type variables;
matching IntNode to Node<T> can generate a constraint T=Integer in the
scope where the pattern matches. Pattern matching is already an
explicitly conditional construct; whether a pattern matches already
flows into scoping and control flow analysis. Refining type constraints
on type variables is a reasonable thing to consider, and offers a
greater type-safety payoff than ordinary flow typing (since most flow
typing can be replaced with pattern matching.)
We have the same problem with the PlusNode and OrNode cases too; if we
match PlusNode, then T can only be Integer, but the RHS will be int and
assigning an int to a T will cause a problem. Only the last case will
type check without gathering extra T constraints.
## The Exhaustiveness Problem
Now suppose we have a Node<Integer>. Then it can only be an IntNode, a
PlusNode, or an IfNode. So the following switch should be exhaustive:
static int eval(Node<Integer> n) {
return switch (n) {
case IntNode(var i) -> i;
case PlusNode(var a, var b) -> eval(a) + eval(b);
case IfNode<Integer>(var c, var a, var b) -> eval(c) ? eval(a)
: eval(b);
};
We need to be able to eliminate BoolNode and OrNode from the list of
types that have to be covered by the switch.
We're proposing changes in the current round (also covered in my
Coverage doc) that refines the "you cover a sealed type if you cover all
the permitted subtypes" rule to exclude those whose parameterization are
impossible.
## The Typing Problem
Even without worrying about the RHS, we have problems with cases like this:
static<T> T eval(Node<T> n) {
return switch (n) {
...
case IfNode<Integer>(var c, IntNode a, IntNode b) -> eval(c) ?
a.i() + b.i(); // optimization
};
We know that an IfNode must have the same node parameterization on both
a and b. We don't encourage raw IfNode here; there should be something
in the <brackets>. The rule is that if a type / record pattern is
generic, the parameterization must be statically consistent with the
target type; there has to be a cast conversion without unchecked
conversion. (This can get refined if we get sufficiently useful
constraints from somewhere else, but not relaxed.) But without some
inference, we can't yet conclude that Integer is a valid (i.e., won't
require unchecked conversion) parameterization for Node<T>. But
clearly, Integer is the only possibility here. So we can't even write
this -- we'd have to use a raw or wildcard case, which is not very
good. We need more inference here, so we have enough type information
for better well-formedness checks.
#### Putting it all together
Here's a related example from the "Lower your Guards" paper which ties
it all together. In Haskell:
data T a b where
T1 :: T Int Bool
T2 :: T Char Bool
g1 :: T Int b -> b -> Int
g1 T1 False = 0
g1 T1 True = 1
Translating to Java:
sealed interface T<A,B> { }
record T1() implements T<int, bool> { }
record T2() implements T<char, bool> { }
record G1<B>(T<int, B> t, B b) { }
B bb = switch (g) {
case G1<bool>(T1 t, false) -> 0;
case G1<bool>(T1 t, true) -> 1;
}
The above switch is exhaustive on G1<B>, but a lot has to happen to
type-check the above switch:
- We need to gather the constraint of B=int in both cases, from the
nested type pattern T1 t.
- We need to flow B=int into the body, so that assignment of int to bb
can proceed.
- We need to type-check that the generics in the cases is compatible
with the target
- (bonus) infer T=bool if not specified explicitly
- We need to conclude that these two cases are exhaustive on G<B>,
which involves observing that T2 is not allowable here, so T1 covers T,
and therefore (T1, false) and (T1, true) cover T x bool.
I hope these examples illustrate that these cases are not silly, made-up
cases; the hierarchy above is a sensible way to model expressions. We
are not going to solve all these problems immediately, but we need a
story for getting there; otherwise we leave users with some bad choices,
most of which involve raw types or unchecked casts.
## A shopping list
So, we have quite a shopping list for type checking patterns. Some show
up immediately as soon as we have record patterns; others will come
later as we add explicit dtors.
#### Well formedness checking
Records can be generic, and we can nest patterns in record patterns.
Record patterns need a well-formedness check to ensure that any nested
patterns are sound. In the absence of inference, this is mostly a
straightforward recursive application of the cast conversion test.
We currently interpret a type pattern:
case ArrayList a:
as a raw ArrayList; I think this may be a mistake for type patterns, but
it surely will be a mistake for record patterns. If we interpret:
case Box(...)
as a raw box, we lose useful type information with which to gauge
exhaustiveness. In any case, if we have generic type parameters:
case Box<String>(...)
we can use the type parameters to refine the component types, which
affects the recursive well-formedness check.
#### Inference
I think we should be using inference routinely on record/dtor patterns,
and we should also consider doing the same on type patterns. So this
means interpreting
case Box(...)
as "implicit diamond". (We can talk about implicit vs explicit diamond,
but that's mostly a syntax distraction, and you know the rules: no
syntax discussions until there is consensus on semantics.)
Inference in this case is different from what we do for methods (like
everything else: well formedness, overload selection) because we're
doing everything in reverse. I'll sketch out an inference algorithm
later, but a key point is that we have to solve nested generic patterns
together with the outer pattern (as we do with methods.) This then
feeds more type information into the well formedness check, which is how
we'd type check the IfNode<Integer> case or the G1<bool> case.
Inference on type variables that flow in from the switch target can be
flowed into the case bodies.
#### Exhaustiveness
I think what we've identified for exhaustiveness is enough.
#### Overload selection
When we add explicit dtor patterns, now we need to do overload
selection. This is like method overload selection, but again key parts
run in reverse.
When we add in primitive patterns with conversions, or varargs patterns,
overload selection will need to reflect the same multi-phase
applicability checking that we have for method overload selection.
Details to come at the appropriate time.
More information about the amber-spec-experts
mailing list