Primitive type patterns and conversions
Brian Goetz
brian.goetz at oracle.com
Mon Mar 1 22:04:56 UTC 2021
Right now, we've spent almost all our time on patterns whose targets are
reference types (type patterns, record patterns, array patterns,
deconstruction patterns). It's getting to be time to nail down (a) the
semantics of primitive type patterns, and (b) the
conversions-and-contexts (JLS 5) rules. And, because we're on the cusp
of the transition to Valhalla, we must be mindful of both both the
current set of primitive conversions, and the more general object model
as it will apply to primitive classes.
If we focus on type patterns alone, let's bear in mind that primitive
type patterns are not nearly as powerful as other type patterns, because
(under the current rules) primitives are "islands" in the type system --
no supertypes, no subtypes. In other words, they are *always total* on
the types they would be strictly applicable to, which means any
conditionality would come from conversions like boxing, unboxing, and
widening. But I'm not sure pattern matching has quite as much to offer
these more ad-hoc conversions.
We have special rules for integer literals; the literal `0` has a
standalone type of `int`, but in most contexts, can be narrowed to
`byte`, `short`, or `char` if it fits into the range. When we were
considering constant patterns, we considered whether those rules were
helpful for applying in reverse to constant patterns, and concluded that
it added a lot of complexity for little benefit. Now that we've decided
against constant patterns for the time being, it may be moot anyway, but
let me draw the example as I think it might be helpful.
Consider the following switch:
int anInt = 300;
switch (anInt) {
case byte b: A
case short s: B
case int i: C
}
What do we expect to happen? One interpretation is that `byte b` is a
pattern that is applicable to all integral types, and only matches the
range of byte values. (In this interpretation, the second case would
match.) The other is that this is a type error; the patterns `byte b`
and `short s` are not applicable to `int`, so the compiler complains.
(In fact, in this interpretation, these patterns are always total, and
their main use is in nested patterns.)
If your initial reaction is that the first interpretation seems pretty
good, beware that the sirens are probably singing to you. Yes, having
the ability to say "does this int fit in a byte" is a reasonable test to
want to be able to express. But cramming this into the semantics of the
type pattern `byte b` is an exercise in complexity, since now we have to
have special rules for each (from, to) pair of primitives we want to
support.
Another flavor of this problem is:
Object o = new Short(3);
switch (o) {
case byte b: A
case short s: B
}
3 can be crammed into a `byte`, and therefore could theoretically match
the first case, but is this really the kind of complexity we want to
layer atop the definition of primitive type patterns?
I think there's a better answer: lean on explicit patterns for
conversions. The conversions from byte <--> int form an embedding
projection pair, which means that they are suited for a total factory +
partial pattern pair:
class int {
static int fromByte(byte b) { return b; }
pattern(byte b) fromByte() { ... succeed if target in range ... }
}
Then we can replace the first switch with:
switch (anInt) {
case fromByte(var b): A // static or instance patterns on `int`
case fromShort(var s): B
}
which is (a) explicit and (b) uses straight library code rather than
complex language magic, and (c) scales to non-built-in primitive
classes. (Readers may first think that the name `fromXxx` is backwards,
rather than `toXxx`, but what we're asking is: "could this int have come
from a byte-to-int conversion".)
So, strawman:
A primitive type pattern `P p` is applicable _only_ to type `P`
(and therefore is always
total). Accordingly, their primary utility is as a nested pattern.
Now, let's ask the same questions about boxing and unboxing. (Boxing is
always total; unboxing might NPE.)
Here, I think introducing boxing/unboxing conversions into pattern
matching per se is even less useful. If a pattern binds an int, but we
wanted an Integer (or vice versa), then we are free (by virtual of
boxing/unboxing in assignment and related contexts) to just use the
binding. For example:
void m(Integer i) { ... }
...
plus some pattern Foo(int x)
...
switch (x) {
case Foo(int x): m(x);
}
We don't care that we got an int out; when we need an Integer, the right
thing happens. In the other direction, we have to worry about NPEs, but
we can fix that with pattern tools we have:
switch (x) {
case Bar(Integer x & true(x != null)): ... safe to unbox x ...
So I think our strawman holds up: primitive type patterns are total on
their type, with no added boxing/narrowing/widening weirdness. We can
characterize this as a new context in Ch5 ("conditional pattern match
context"), that permits only identity and reference widening
conversions. And when we get to Valhalla, the same is true for type
patterns on primitive classes.
** BONUS ROUND **
Now, let's talk about pattern assignment statements, such as:
Point(var x, var y) = aPoint
The working theory is that the pattern on the LHS must be total on the
type of the expression on the RHS, with some remainder allowed, and will
throw on any remainder (e.g., can throw NPE on null.) If we want to
align this with the semantics of local variable declaration +
initializer, we probably *do* want the full set of assignment-context
conversions, which I think is fine in this context (so, a second new
context: unconditional pattern assignment, which allows all the same
conversions as are allowed in assignment context.)
If the set of conversions is the same, then we are well on our way to
being able to interpret
T t = e
as *either* a local variable declaration, *or* a pattern match, without
the user being able to tell the difference:
- The scoping is the same (since the pattern either completes normally
or throws);
- The mutability is the same (we fixed this one just in time);
- The set of conversions, applicable types, and potential exceptions
are the same (exercise left to the reader.)
Which means (drum roll) local variable assignment is revealed to have
been a degenerate case of pattern match all along. (And the crowd goes
wild.)
More information about the amber-spec-observers
mailing list