More on patterns, generics, null and primitives

Thu Dec 7 15:49:00 UTC 2017

Below are some more refined thoughts about patterns, generics, null and primitives. Thank you for your feedback so far. Comments welcome!

Gavin
----

Patterns: Types, primitives and null
====================================

In this note, we address some of the issues raised in earlier posts concerning pattern matching. 

To make matters concrete, let us define a simple grammar of patterns:

p  ::= l                    //literal (primitive type constant expression)
       null                 //null pattern
       R x                  //Reference type test pattern
       V x                  //Value type test pattern
       D(p_1, ..., p_n)     //Deconstructor pattern (D is a *data class*)
       var x                //Type inference

For simplicity, we have only considered primitive type constant expression literals, but we can easily extend what follows to cover string constant expressions and Enum names. For generality we have included simple deconstruction patterns, which will most likely appear first with data classes - a feature we have proposed elsewhere.

Type checking basics
====================

To type check the expression e matches R x where R is a reference type, we check that the type of the expression e is cast convertible to R. This is captured by the following typing rule:

e : T     T cast-convertible R
------------------------------
e matches R x : bool

For value types, we propose a simpler rule. When type checking an expression e matches V x where V is a value type, we shall simply check that the type of e is either the value type V or the boxed type of V. Again we can write this more formally as follows.

e : V   or   e : box(V)
-----------------------
e matches V x : bool

(We have assumed a function, box, that returns the boxed reference type of a given value type, e.g. box(int)=Integer.)

Type checking literal patterns is similar. Thus given an expression e matches l where l is a primitive type constant expression of type V, we check that e has exactly the type V. (We could also permit e to have the boxed type of V, but we disallow this for simplicity.)

e : V    l : V
------------------
e matches l : bool

[A small corner case with this rule is where the primitive type constant expression l is a numeric literal without a suffix, e.g. 1. In this case we would treat l as a poly expression and use the type of the expression e as a target type.]

Given an expression e matches null we check that e has some reference type R.

e : R
---------------------
e matches null : bool

Generics
========

Currently instanceof is restricted so that the type must be reifiable. In other words the following is not allowed.

ArrayList<Integer> al = ...
if (al instanceof List<Integer>) {          // Error
    ...
}

We propose to lift this restriction for pattern matching using type test patterns. In other words, a reference type test pattern match is well typed if the cast conversion is not unchecked. If the cast conversion is unchecked then type checking fails.

In what follows we will assume that all pattern matching code has type checked correctly.

Nulls
=====

To recall, the issue with nulls and pattern matching came from Java’s current slightly inconsistent treatment of null.

Currently x instanceof Foo returns false where x is the null value. Clearly, we would want x matches Foo f to do the same.

However, when we come to support deconstruction patterns, we would expect Foo(null) to match a pattern Foo(var x). Moreover, we intend the pattern var x to mean the same as Bar x where the type has been inferred to be Bar. So we appear to have conflicting requirements!

A further complication is that the switch statement currently throws an exception when the switch expression is the null value.

To recap:

Foo f = null;                //Foo has a String field
f matches Foo x              //would like to return false as for instanceof
switch (f) { case Foo: ... } //throws
f = new Foo(null);
f matches Foo(String x)      //would like to return true and bind x to null
f matches Foo(var x)         //behave identically to above

We need to reconcile the current semantics and find a generalisation that matches (sic) the above behaviour for patterns.

First, we shall generalise the switch statement in the following way. We propose that it no longer throws an exception when the switch expression is the null value. However, we propose that all switches have an implicit case null: throw new NullPointerException(); clause, unless there is an explicit case null pattern label in the switch body. In this way, all existing switch statements will continue to mean the same thing, but users of the enhanced switch can now program a case to handle the null value case.

We had previously proposed dealing with null by considering the type of the pattern test to determine if it was a type-restatement pattern or not. Whilst this works, it could cause some subtle action-at-a-distance effects that we found discomforting. Another possibility had been to classify patterns as top-level and non-top-level and assign them different semantics. This was felt to be too subtle.

We think we now have a solution that avoids this partition of patterns, leading to a simpler and clearer system. The key to this solution is to separate the semantics of a pattern from the semantics of the construct that uses the pattern.

tl;dr: The semantics of pattern matching is defined to match null values against type test patterns. The semantics of constructs that use pattern  matching is layered over this definition in order to capture the existing behaviours with respect to null values at the top level.

Given a pattern p, and a value e of type T, we can define the semantics of this pattern with a function, match, that returns a boolean indicating whether the pattern matches the value or not. (Recall that we assume that type checking has already succeeded.)

match(l, e:V)      =def e = l
match(l, e:R)      =def e <> null /\ unbox(e) = l

match(null, e:V)   =def false               // n/a by type-checking
match(null, e:R)   =def e = null

match(R t, e:V)    =def false               // n/a by type-checking
match(R t, e:R')   =def e = null \/ e instanceof R

match(V t, e:V')   =def true                // by type-checking, V = V'
match(V t, e:R)    =def e <> null           // by type-checking, R = box(V)

match(D(pa_1, ..., pa_n), e:V) =def false
match(D(pa_1, ..., pa_n), e:R) =def 
                        e instanceof D /\ match(pa_1, e.x_1:T_1) /\ 
                                   ... /\ match(pa_n, e.x_n:T_n)
                                where  data class D(T_1 x_1, ..., T_n x_n);

This definition should be intuitive, but let us consider the first two cases that deal with matching a value e against a literal pattern l. If e has a value type then the pattern match is just a simple check for equality between the value and the literal. If e has a reference type then we first check if it has the null value. If so we return false, otherwise we unbox the value and recursively check this against the pattern.

The important case is where we have a type test pattern for a reference type R and a value of a reference type. In this case we check if the value has the null value. If so we return true, otherwise we check whether the value is an instance of the type R. In other words, the inherent semantics of pattern matching returns true when matching a null value against a reference type test R x.

Note there is no case for the var pattern as the compiler will have replaced it with an explicit type test pattern as a result of type inference.

The final step is to use this to define the semantics of the two constructs that use pattern matching, i.e. matches and switch. This can be summarised using the following table (where e has static type T).

                | e = null      |  e <> null
-------------------------------------------------
                |               | 
e matches P     | false         |  match(P, e:T)
                |               |
                |               |
switch (e) {    |               |
  case null: s  | s             |             
}               |               |
                |               |
                |               |
switch (e) {    |               |
  // P not null | NPE           | s    if match(P,e:T)
  case P : s    |               |      
}               |               | 
                |               |

Below is a number of examples demonstrating these semantics in action.

Object o = null;
o matches String s                          // false
o matches Box(var x)                        // false
new Box(o) matches Box b                    // true
new Box(o) matches Box(null)                // true
new Box(o) matches Box(Object o)            // true
new Box(o) matches Box(var o)               // true (o inferred to have
                                            //       type Object)

switch (o) { case String s: }               // NPE

switch (o) { 
    case null: System.out.println("hello");
}                                           // Prints hello

o=(Object)"Hello";

switch (o) { 
    case String s: System.out.println(s);
}                                           // Prints Hello

switch (o) { 
    case null: break; 
    case String s: System.out.println(s);
}                                           // Prints Hello

o=null;
switch (new Box(o)) { 
    case Box b: System.out.println("Box");
}                                           // Prints Box

switch (new Box(o)) { 
    case Box(null): 
        System.out.println("A");
        break; 
    case Box(var x): 
        System.out.println("B");
}                                           // Prints A

switch (new Box(o)) { 
    case Box(var x): 
        System.out.println(x==null);
}                                           // Prints true