Primitive type patterns and conversions

Mon Mar 1 22:04:56 UTC 2021

Right now, we've spent almost all our time on patterns whose targets are 
reference types (type patterns, record patterns, array patterns, 
deconstruction patterns).  It's getting to be time to nail down (a) the 
semantics of primitive type patterns, and (b) the 
conversions-and-contexts (JLS 5) rules.  And, because we're on the cusp 
of the transition to Valhalla, we must be mindful of both both the 
current set of primitive conversions, and the more general object model 
as it will apply to primitive classes.

If we focus on type patterns alone, let's bear in mind that primitive 
type patterns are not nearly as powerful as other type patterns, because 
(under the current rules) primitives are "islands" in the type system -- 
no supertypes, no subtypes.  In other words, they are *always total* on 
the types they would be strictly applicable to, which means any 
conditionality would come from conversions like boxing, unboxing, and 
widening.  But I'm not sure pattern matching has quite as much to offer 
these more ad-hoc conversions.

We have special rules for integer literals; the literal `0` has a 
standalone type of `int`, but in most contexts, can be narrowed to 
`byte`, `short`, or `char` if it fits into the range.  When we were 
considering constant patterns, we considered whether those rules were 
helpful for applying in reverse to constant patterns, and concluded that 
it added a lot of complexity for little benefit.  Now that we've decided 
against constant patterns for the time being, it may be moot anyway, but 
let me draw the example as I think it might be helpful.

Consider the following switch:

     int anInt = 300;

     switch (anInt) {
         case byte b:  A
         case short s: B
         case int i: C
     }

What do we expect to happen?  One interpretation is that `byte b` is a 
pattern that is applicable to all integral types, and only matches the 
range of byte values.  (In this interpretation, the second case would 
match.)  The other is that this is a type error; the patterns `byte b` 
and `short s` are not applicable to `int`, so the compiler complains.  
(In fact, in this interpretation, these patterns are always total, and 
their main use is in nested patterns.)

If your initial reaction is that the first interpretation seems pretty 
good, beware that the sirens are probably singing to you.  Yes, having 
the ability to say "does this int fit in a byte" is a reasonable test to 
want to be able to express.  But cramming this into the semantics of the 
type pattern `byte b` is an exercise in complexity, since now we have to 
have special rules for each (from, to) pair of primitives we want to 
support.

Another flavor of this problem is:

     Object o = new Short(3);

     switch (o) {
         case byte b:  A
         case short s: B
     }

3 can be crammed into a `byte`, and therefore could theoretically match 
the first case, but is this really the kind of complexity we want to 
layer atop the definition of primitive type patterns?

I think there's a better answer: lean on explicit patterns for 
conversions.  The conversions from byte <--> int form an embedding 
projection pair, which means that they are suited for a total factory + 
partial pattern pair:

     class int {
         static int fromByte(byte b) { return b; }
         pattern(byte b) fromByte() { ... succeed if target in range ... }
     }

Then we can replace the first switch with:

     switch (anInt) {
         case fromByte(var b): A    // static or instance patterns on `int`
         case fromShort(var s): B
     }

which is (a) explicit and (b) uses straight library code rather than 
complex language magic, and (c) scales to non-built-in primitive 
classes.  (Readers may first think that the name `fromXxx` is backwards, 
rather than `toXxx`, but what we're asking is: "could this int have come 
from a byte-to-int conversion".)

So, strawman:

     A primitive type pattern `P p` is applicable _only_ to type `P` 
(and therefore is always
     total).  Accordingly, their primary utility is as a nested pattern.

Now, let's ask the same questions about boxing and unboxing.  (Boxing is 
always total; unboxing might NPE.)

Here, I think introducing boxing/unboxing conversions into pattern 
matching per se is even less useful.  If a pattern binds an int, but we 
wanted an Integer (or vice versa), then we are free (by virtual of 
boxing/unboxing in assignment and related contexts) to just use the 
binding.  For example:

     void m(Integer i) { ... }
     ...
     plus some pattern Foo(int x)
     ...

     switch (x) {
         case Foo(int x): m(x);
     }

We don't care that we got an int out; when we need an Integer, the right 
thing happens.  In the other direction, we have to worry about NPEs, but 
we can fix that with pattern tools we have:

     switch (x) {
         case Bar(Integer x & true(x != null)): ... safe to unbox x ...

So I think our strawman holds up: primitive type patterns are total on 
their type, with no added boxing/narrowing/widening weirdness.  We can 
characterize this as a new context in Ch5 ("conditional pattern match 
context"), that permits only identity and reference widening 
conversions.  And when we get to Valhalla, the same is true for type 
patterns on primitive classes.

** BONUS ROUND **

Now, let's talk about pattern assignment statements, such as:

     Point(var x, var y) = aPoint

The working theory is that the pattern on the LHS must be total on the 
type of the expression on the RHS, with some remainder allowed, and will 
throw on any remainder (e.g., can throw NPE on null.)  If we want to 
align this with the semantics of local variable declaration + 
initializer, we probably *do* want the full set of assignment-context 
conversions, which I think is fine in this context (so, a second new 
context: unconditional pattern assignment, which allows all the same 
conversions as are allowed in assignment context.)

If the set of conversions is the same, then we are well on our way to 
being able to interpret

     T t = e

as *either* a local variable declaration, *or* a pattern match, without 
the user being able to tell the difference:

  - The scoping is the same (since the pattern either completes normally 
or throws);
  - The mutability is the same (we fixed this one just in time);
  - The set of conversions, applicable types, and potential exceptions 
are the same (exercise left to the reader.)

Which means (drum roll) local variable assignment is revealed to have 
been a degenerate case of pattern match all along.  (And the crowd goes 
wild.)