<html><body><div id="zimbraEditorContainer" style="font-family: arial, helvetica, sans-serif; font-size: 12pt; color: #000000" class="2"><div><br></div><div><br></div><hr id="zwchr" data-marker="__DIVIDER__"><div data-marker="__HEADERS__"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Brian Goetz" <brian.goetz@oracle.com><br><b>To: </b>"amber-spec-experts" <amber-spec-experts@openjdk.java.net><br><b>Sent: </b>Thursday, September 8, 2022 6:53:21 PM<br><b>Subject: </b>Primitives in instanceof and patterns<br></blockquote></div><div data-marker="__QUOTED_TEXT__"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><font size="4"><font face="monospace">Earlier in the year we talked
about primitive type patterns. Let me summarize<br>
the past discussion, what I think the right direction is, and
why this is (yet<br>
another) "finishing up the job" task for basic patterns that, if
left undone,<br>
will be a sharp edge.<br><br>
Prior to record patterns, we didn't support primitive type
patterns at all. With<br>
records, we now support primitive type patterns as nested
patterns, but they are<br>
very limited; they are only applicable to exactly their own
type. <br><br>
The motivation for "finishing" primitive type patterns is the
same as discussed<br>
earlier this week with array patterns -- if pattern matching is
the dual of<br>
aggregation, we want to avoid gratuitous asymmetries that let
you put things<br>
together but not take them apart. <br><br>
Currently, we can assign a `String` to an `Object`, and recover
the `String`<br>
with a pattern match:<br><br>
Object o = "Bob";<br>
if (o instanceof String s) { println("Hi Bob"); }<br><br>
Analogously, we can assign an `int` to a `long`:<br><br>
long n = 0;<br><br>
but we cannot yet recover the int with a pattern match:<br><br>
if (n instanceof int i) { ... } // error, pattern `int i`
not applicable to `long`<br><br>
To fill out some more of the asymmetries around records if we
don't finish the job: given <br><br>
record R(int i) { }<br><br>
we can construct it with<br><br>
new R(anInt) // no adaptation<br>
new R(aShort) // widening<br>
new R(anInteger) // unboxing<br><br>
but yet cannot deconstruct it the same way:<br><br>
case R(int i) // OK<br>
case R(short s) // nope<br>
case R(Integer i) // nope<br><br>
It would be a gratuitous asymmetry that we can use pattern
matching to recover from<br>
reference widening, but not from primitive widening. While many
of the<br>
arguments against doing primitive type patterns now were of the
form "let's keep<br>
things simple", I believe that the simpler solution is actually
to _finish the<br>
job_, because this minimizes asymmetries and potholes that users
would otherwise<br>
have to maintain a mental catalog of. <br><br>
Our earlier explorations started (incorrectly, as it turned
out), with<br>
assignment context. This direction gave us a good push in the
right direction,<br>
but turned out to not be the right answer. A more careful
reading of JLS Ch5<br>
convinced me that the answer lies not in assignment conversion,
but _cast<br>
conversion_. <br><br>
#### Stepping back: instanceof<br><br>
The right place to start is actually not patterns, but
`instanceof`. If we<br>
start here, and listen carefully to the specification, it leads
us to the<br>
correct answer. <br><br>
Today, `instanceof` works only for reference types.
Accordingly, most people<br>
view `instanceof` as "the subtyping operator" -- because that's
the only<br>
question we can currently ask it. We almost never see
`instanceof` on its own;<br>
it is nearly always followed by a cast to the same type.
Similarly, we rarely<br>
see a cast on its own; it is nearly always preceded by an
`instanceof` for the<br>
same type. <br><br>
There's a reason these two operations travel together: casting
is, in general,<br>
unsafe; we can try to cast an `Object` reference to a `String`,
but if the<br>
reference refers to another type, the cast will fail. So to
make casting safe,<br>
we precede it with an `instanceof` test. The semantics of
`instanceof` and<br>
casting align such that `instanceof` is the precondition test
for safe casting.<br><br>
> instanceof is the precondition for safe casting<br><br>
Asking `instanceof T` means "if I cast this to T, would I like
the answer."<br>
Obviously CCE is an unlikable answer; `instanceof` further
adopts the opinion<br>
that casting `null` would also be an unlikable answer, because
while the cast<br>
would succeed, you can't do anything useful with the result.<br><br>
Currently, `instanceof` is only defined on reference types, and
on this domain<br>
coincides with subtyping. On the other hand, casting is defined
between<br>
primitive types (widening, narrowing), and between primitive and
reference types<br>
(boxing, unboxing). Some casts involving primitives yield
"better" results than<br>
others; casting `0` to `byte` results in no loss of information,
since `0` is<br>
representable as a byte, but casting `500` to `byte` succeeds
but loses<br>
information because the higher order bits are discarded. <br><br>
If we characterize some casts as "lossy" and others as "exact"
-- where lossy<br>
means discarding useful information -- we can extend the "safe
casting<br>
precondition" meaning of `instanceof` to primitive operands and
types in the<br>
obvious way -- "would casting this expression to this type
succeed without error<br>
and without information loss." If the type of the expression is
not castable to<br>
the type we are asking about, we know the cast cannot succeed
and reject the<br>
`instanceof` test at compile time.<br><br>
Defining which casts are lossy and which are exact is fairly
straightforward; we<br>
can appeal to the concept already in the JLS of "representable
in the range of a<br>
type." For some pairs of types, casting is always exact (e.g.,
casting `int` to<br>
`long` is always exact); we call these "unconditionally exact".
For other pairs<br>
of types, some values can be cast exactly and others cannot. <br><br>
Defining which casts are exact gives us a simple and precise
semantics for `x<br>
instanceof T`: whether `x` can be cast exactly to `T`.
Similarly, if the static<br>
type of `x` is not castable to `T`, then the corresponding
`instanceof` question<br>
is rejected statically. The answers are not suprising:<br><br>
- Boxing is always exact;<br>
- Unboxing is exact for all non-null values;<br>
- Reference widening is always exact;<br>
- Reference narrowing is exact if the type of the target
expression is a<br>
subtype of the target type;<br>
- Primitive widening and narrowing are exact if the target
expression can be<br>
represented in the range of the target type.<br><br>
#### Primitive type patterns<br><br>
It is a short hop from `instanceof` to patterns (including
primitive type<br>
patterns, and reference type patterns applied to primitive
types), which can be<br>
defined entirely in terms of cast conversion and exactness: <br><br>
- A type pattern `T t` is applicable to a target of type `S` if
`S` is<br>
cast-convertible to `T`;<br>
- A type pattern `T t` matches a target `x` if `x` can be cast
exactly to `T`;<br>
- A type pattern `T t` is unconditional at type `S` if casting
from `T` to `S`<br>
is unconditionally exact;<br>
- A type pattern `T t` dominates a type pattern `S s` (or a
record pattern<br>
`S(...)`) if `T t` would be unconditional on `S`.<br><br>
While the rules for casting are complex, primitive patterns add
no new<br>
complexity; there are no new conversions or conversion
contexts. If we see:<br><br>
switch (a) { <br>
case T t: ...<br>
}<br><br>
we know the case matches if `a` can be cast exactly to `T`, and
the pattern is<br>
unconditional if _all_ values of `a`'s type can be cast exactly
to `T`. Note<br>
that none of this is specific to primitives; we derive the
semantics of _all_<br>
type patterns from the enhanced definition of casting.<br><br>
Now, our record deconstruction examples work symmetrically to
construction: <br><br>
case R(int i) // OK<br>
case R(short s) // test if `i` is in the range of `short`<br>
case R(Integer i) // box `i` to `Integer`<br><br><br></font></font></blockquote><div><br></div><div><span style="font-size: large;"><span style="font-family: monospace;">I think we hev to be careful with you notion of dual here, a record canonical constructor and a deconstructing pattern are dual, but it's a special case because the deconstructing pattern always match, once you introduce patterns that may match or not, there is no duality anymore.<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">The primitive pattern you propose is clearly not the dual of the cast conversions, because the casting conversions are verified by the compiler while some of the primitive patterns you propose are checked at runtime.<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">As an example, if there is a method declared like this<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> static void m(int i) { ... }<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">and this method is called with a short,</span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> short s = ...<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> m(s);<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> there is an implicit conversion from short to int, and if the first parameter of m is not compatible a compiler error occurs.</span></span><span style="font-size: large;"><span style="font-family: monospace;"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">If you compare with the corresponding pattern <br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> int i = ...<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> switch(i) {<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> case short s -> ...<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> }<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">The semantics you propose is not to emit a compile error but at runtime to check if the value "i" is beetween Short.MIN_VALUE and Short.MAX_VALUE.<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">So there is perhaps a syntactic duality but clearly there is no semantics duality.<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">Moreover, the semantics you propose is not aligned with the concept of data oriented programming which says that the data are more important than the code so that we should try to raise a compile error when the data changed to help the developer to change the code.<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">If we take a simple example<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> record Point(int x, int y) { }<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> Point point = ...<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> switch(point) {<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> case Point(int i, int j) -> ...<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> ...<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> }<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">let say know that we change Point to use longs<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"> record Point(long x, long y) { }<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">With the semantics you propose, the code still compile but the pattern is now transformed to a partial pattern that will not match all Points but only the ones with x and y in between Integer.MIN_VALUE and Integer.MAX_VALUE.</span></span><span style="font-size: large;"><span style="font-family: monospace;"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">I believe this is exactly what Stephen Colbourne was complaining when we discussed the previous iteration of this spec, the semantics of the primtiive pattern change depending on the definition of the data.<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">The remark of Tagir about array pattern also works here, having a named pattern like Short.asShort() makes the semantics far cleared because it disambiguate between a a pattern that request a conversion and a pattern that does a conversion because the data definition has changed.<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">And i'm still worry that we are muddying the water here, instanceof is about instance and subtypining relationship (hence the name), extending it to cover non-instance / primitive value is very confusing. </span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;"><br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">regards,<br data-mce-bogus="1"></span></span></div><div><span style="font-size: large;"><span style="font-family: monospace;">RĂ©mi<br data-mce-bogus="1"></span></span></div><div><br data-mce-bogus="1"></div></div></div></body></html>