Evolving past reference type patterns

Fri Apr 15 20:50:27 UTC 2022

We characterize patterns by their /applicability/ (static type 
checking), /unconditionality/ (can matching be determined without a 
dynamic check, akin to the difference between a static and dynamic 
cast), and /behavior/ (under what conditions does it match, and what 
bindings do we get.)

        Currently shipping

A type pattern |T t| for a ref type T is /applicable to/ a ref type U if 
U is downcast-convertible to T.

A type pattern |T t| is /unconditional/ on |U| if |U <: T|.

A type pattern |T t| matches a target x when the pattern is 
unconditional, or when |x instanceof T|; if so, its binding is |(T) x|.

        Record patterns

In the next round, we will add /record patterns/, which bring in /nested 
patterns/.

A record pattern |R(P*)| is applicable to a reference type U if U is 
downcast-convertible to R. A record pattern is never unconditional.

Record patterns also drag in primitive patterns, because records can 
have primitive components.

A primitive type pattern |P p| is applicable to, and unconditional on, 
the type P. A primitive type matches a target x when the pattern is 
unconditional, and its binding is |(P) x|.

Record patterns also drag in |var| patterns as nested patterns. A |var| 
pattern is applicable to, and unconditional on, every type U, and its 
binding when matched to |x| whose static type is |U|, is |x| (think: 
identity conversion.)

This is what we intend to specify for 19.

        Primitive patterns

Looking ahead, we’ve talked about how far to extend primitive patterns 
beyond exact matches. While I know that this makes some people 
uncomfortable, I am still convinced that there is a more powerful role 
for patterns to play here, and that is: as the cast precondition.

A language that has casts but no way to ask “would this cast succeed” is 
deficient; either casts will not be used, or we would have to tolerate 
cast failure, manifesting as either exceptions or data loss / 
corruption. (One could argue that for primitive casts, Java is deficient 
in this way now (you can make a lossy cast from long to int), but the 
monomorphic nature of primitive types mitigates this somewhat.) Prior to 
patterns, users have internalized that before a cast, you should first 
do an |instanceof| to the same type. For reference types, the 
|instanceof| operator is the “cast precondition” operator, with an 
additional (sensible) opinion that |null| is not deemed to be an 
instance of anything, because even if the cast were to succeed, the 
result would be unlikely to be usable as the target type.

There are many types that can be cast to |int|, at least under some 
conditions:

  * Integer, except null
  * byte, short, and char, unconditionally
  * Byte, Short, and Character, except null
  * long, but with potential loss of precision
  * Object or Number, if it’s not null and is an Integer

  * any int
  * Integer, when the instance is non-null (unboxing)
  * Any reference type that is cast-convertible to Integer, and is
    |instanceof Integer| (unboxing)
  * byte, short, and char, unconditionally (types that can be widened to
    int)
  * Byte, Short, and Character, when non-null (unboxing plus widening)
  * long when in the range of int (narrowing)
  * Long when non-null, and in the range of int (unboxing plus narrowing)

This table can be generated simply by looking at the set of cast 
conversions — and we haven’t talked about patterns yet. This is simply 
the generalization of |instanceof| to primitives. If we are to allow 
|instanceof int| at all, I don’t think there is really any choice of 
what it means. And this is useful in the language we have today, 
separate from patterns:

  * asking if something fits in the range of a byte or int; doing this
    by hand is annoying and error-prone
  * asking if casting from long to int would produce truncation; doing
    this by hand is annoying and error-prone

Doing this means that

|if (x instanceof T) ... (T) x ... |

becomes universally meaningful, and captures exactly the preconditions 
for when the cast succeeds without error, loss of precision, or null 
escape. (And as Valhalla is going to bring primitives more into the 
world of objects, generalizing this relationship will become only more 
important.)

And if we’ve given meaning to |instanceof int|, it is hard to see how 
the pattern |int x| could behave any differently than |instanceof int|, 
because otherwise, we could not refactor the above idiom to:

|if (x instanceof T t) ... t ... |

Extending instanceof / pattern matching to primitives in this way is not 
only a sensible generalization, but failing to do so would expose 
gratuitous asymmetries that would be impediments to refactoring:

  *

    Cannot necessarily refactor |int x = 0| with |let int x = 0|. While
    this may seem non-problematic on the surface, as soon as |let|
    acquires any other feature besides “straight unconditional pattern
    assignment”, such as let-expression, it puts users in the bad choice
    between “Can use let, or can use assignment conversion, but not both.”

  *

    Loss of duality between |new X(args)| and |case X(ARGS)|. The
    duality between construction and deconstruction patterns (and
    similar for static factories/patterns, builders/“unbuilders”, and
    collection literals/patterns is a key part of the story; we take
    things apart in the same way we put them together. Any gratuitous
    divergence becomes an avoidable sharp edge.

Since these are related to assignment and method invocation, let’s ask: 
how do these conversions line up with assignment and method invocation 
conversions?

There are two main differences between the safe cast conversions and 
assignment context. One has to do with narrowing; the “if it’s a literal 
and in range” is the best approximation that assignment can do, while in 
a context that accepts partial patterns, the pattern can be more 
discriminating, and so should. The other is treatment of null; again, 
because of the totality requirement, assignment throws when unboxing a 
null, but pattern matching in a partial context can deal more 
gracefully, and simply decline to match.

There are also some small differences between the safe cast conversions 
and method invocation context. There is the same issue with unboxing 
null (throws in (loose) invocation context), and method invocation 
context makes no attempt to do narrowing, even for literals. This last 
seems mostly a historical wart, which now can’t be changed because it 
would either potentially change (very few) overload selection choices, 
or would require another stage of selection.

What are the arguments against this interpretation? They seem to be 
various flavors of “ok, but, do we really need this?” and “yikes, new 
complexity.”

The first argument comes from a desire to treat pattern matching as a 
“Coin”-like feature, strictly limiting its scope. (As an example of a 
similar kind of pushback, in the early days, it was asked “but does 
pattern matching have to be an expression, couldn’t we just have an 
“ifmatch” statement? (See answer here: 
http://mail.openjdk.java.net/pipermail/amber-dev/2018-December/003842.html) 
This is the sort of question we get a lot — there’s a natural tendency 
to try to “scope down” features that seem unfamiliar. But I think it’s 
counterproductive here.

The second argument is largely a red herring, in that this is /not/ new 
complexity, since these are exactly the rules for successful casts. In 
fact, not doing it might well be perceived as new complexity, since it 
results in more corner cases where refactorings that seem like they 
should work, do not, because of conversions.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20220415/d1dac8b0/attachment-0001.htm>