Primitives in instanceof and patterns
Brian Goetz
brian.goetz at oracle.com
Thu Sep 8 20:03:45 UTC 2022
Sigh, floating point. Yes, this is the most difficult corner of this work.
Bear in mind that you're really asking questions about cast conversion.
Additionally, we have to define which of these casts are "lossy" vs
which are "information preserving" (exact), which for most numbers is
straightforward but for weird floating point values might be harder.
So let's first see what happens when we cast your examples:
jshell> (int) 2.0f
$1 ==> 2
jshell> (int) 2.5f
$2 ==> 2
jshell> (float) Double.MAX_VALUE
Infinity
jshell> (float) Math.PI
$4 ==> 3.1415927
jshell> (float) Double.NaN
$5 ==> NaN
jshell> (double) Double.NaN
$6 ==> NaN
Clearly #2 (casting 2.5 to int) is lossy, and therefore would not be
exact, so we can cross that one off the list. Similar for
Double.MAX_VALUE to float. It's also pretty clear that Math.PI -- which
is merely an alias for 3.14159265358979323846 -- is also not
representable in the range of float.
So that leaves:
2.0 instanceof int
Double.NaN instanceof float
Double.NaN instanceof double
The first one is an exact cast; this can be specified in a number of
ways, including "sufficient number of low order bits are zero", or
"representable in the range of the target type", or others. The binary
representation of 2.0f is 01000000000000000000000000000000, which
exactly encodes an integer. Again, this is not a rule about
`instanceof`; this is derived from casting (though we do have to define
what exactness means.)
For NaN, negative zero, and infinity, whether various conversions are
exact or not may require a somewhat ad-hoc decision, but I think the
intuitive answers for NaN is that Float.NaN <--> Double.NaN is exact,
and similar for Float.Inf <--> Double.Inf. There will be an argument
about -0.0 and 0.0.
> Intuitively, I think I would expect that if t is an expression of
> primitive type T, and U is also a primitive type, then t instanceof
> Uis true iff t == (T) (U) t. (Perhaps with a variant of == that is
> reflexive even for NaN, or perhaps not.) That would make (1) true,
> (2,3,4) false, and (5,6) who knows.
This is a good intuition, and is true for some types, and almost true
for others, but it falls into some holes. In particular, with some
conversions, `(U) t` is lossy but `(T) (U) t` is also lossy in an
exactly compensating way.
> This makes a lot of sense. I'm wondering, though, how it works with
> float and double. I don't see the answer by looking at JLS Ch5. Are
> the following expressions legal, and if so which ones are true?
>
> 1. 2.0 instanceof int
> 2. 2.5 instanceof int
> 3. Double.MAX_VALUE instanceof float
> 4. Math.PI instanceof float
> 5. Double.NaN instanceof float
> 6. Double.NaN instanceof double
>
> Intuitively, I think I would expect that if t is an expression of
> primitive type T, and U is also a primitive type, then t instanceof
> Uis true iff t == (T) (U) t. (Perhaps with a variant of == that is
> reflexive even for NaN, or perhaps not.) That would make (1) true,
> (2,3,4) false, and (5,6) who knows.
>
> On Thu, 8 Sept 2022 at 09:53, Brian Goetz <brian.goetz at oracle.com> wrote:
>
> Earlier in the year we talked about primitive type patterns. Let
> me summarize
> the past discussion, what I think the right direction is, and why
> this is (yet
> another) "finishing up the job" task for basic patterns that, if
> left undone,
> will be a sharp edge.
>
> Prior to record patterns, we didn't support primitive type
> patterns at all. With
> records, we now support primitive type patterns as nested
> patterns, but they are
> very limited; they are only applicable to exactly their own type.
>
> The motivation for "finishing" primitive type patterns is the same
> as discussed
> earlier this week with array patterns -- if pattern matching is
> the dual of
> aggregation, we want to avoid gratuitous asymmetries that let you
> put things
> together but not take them apart.
>
> Currently, we can assign a `String` to an `Object`, and recover
> the `String`
> with a pattern match:
>
> Object o = "Bob";
> if (o instanceof String s) { println("Hi Bob"); }
>
> Analogously, we can assign an `int` to a `long`:
>
> long n = 0;
>
> but we cannot yet recover the int with a pattern match:
>
> if (n instanceof int i) { ... } // error, pattern `int i` not
> applicable to `long`
>
> To fill out some more of the asymmetries around records if we
> don't finish the job: given
>
> record R(int i) { }
>
> we can construct it with
>
> new R(anInt) // no adaptation
> new R(aShort) // widening
> new R(anInteger) // unboxing
>
> but yet cannot deconstruct it the same way:
>
> case R(int i) // OK
> case R(short s) // nope
> case R(Integer i) // nope
>
> It would be a gratuitous asymmetry that we can use pattern
> matching to recover from
> reference widening, but not from primitive widening. While many of the
> arguments against doing primitive type patterns now were of the
> form "let's keep
> things simple", I believe that the simpler solution is actually to
> _finish the
> job_, because this minimizes asymmetries and potholes that users
> would otherwise
> have to maintain a mental catalog of.
>
> Our earlier explorations started (incorrectly, as it turned out), with
> assignment context. This direction gave us a good push in the
> right direction,
> but turned out to not be the right answer. A more careful reading
> of JLS Ch5
> convinced me that the answer lies not in assignment conversion,
> but _cast
> conversion_.
>
> #### Stepping back: instanceof
>
> The right place to start is actually not patterns, but
> `instanceof`. If we
> start here, and listen carefully to the specification, it leads us
> to the
> correct answer.
>
> Today, `instanceof` works only for reference types. Accordingly,
> most people
> view `instanceof` as "the subtyping operator" -- because that's
> the only
> question we can currently ask it. We almost never see
> `instanceof` on its own;
> it is nearly always followed by a cast to the same type.
> Similarly, we rarely
> see a cast on its own; it is nearly always preceded by an
> `instanceof` for the
> same type.
>
> There's a reason these two operations travel together: casting is,
> in general,
> unsafe; we can try to cast an `Object` reference to a `String`,
> but if the
> reference refers to another type, the cast will fail. So to make
> casting safe,
> we precede it with an `instanceof` test. The semantics of
> `instanceof` and
> casting align such that `instanceof` is the precondition test for
> safe casting.
>
> > instanceof is the precondition for safe casting
>
> Asking `instanceof T` means "if I cast this to T, would I like the
> answer."
> Obviously CCE is an unlikable answer; `instanceof` further adopts
> the opinion
> that casting `null` would also be an unlikable answer, because
> while the cast
> would succeed, you can't do anything useful with the result.
>
> Currently, `instanceof` is only defined on reference types, and on
> this domain
> coincides with subtyping. On the other hand, casting is defined
> between
> primitive types (widening, narrowing), and between primitive and
> reference types
> (boxing, unboxing). Some casts involving primitives yield
> "better" results than
> others; casting `0` to `byte` results in no loss of information,
> since `0` is
> representable as a byte, but casting `500` to `byte` succeeds but
> loses
> information because the higher order bits are discarded.
>
> If we characterize some casts as "lossy" and others as "exact" --
> where lossy
> means discarding useful information -- we can extend the "safe casting
> precondition" meaning of `instanceof` to primitive operands and
> types in the
> obvious way -- "would casting this expression to this type succeed
> without error
> and without information loss." If the type of the expression is
> not castable to
> the type we are asking about, we know the cast cannot succeed and
> reject the
> `instanceof` test at compile time.
>
> Defining which casts are lossy and which are exact is fairly
> straightforward; we
> can appeal to the concept already in the JLS of "representable in
> the range of a
> type." For some pairs of types, casting is always exact (e.g.,
> casting `int` to
> `long` is always exact); we call these "unconditionally exact".
> For other pairs
> of types, some values can be cast exactly and others cannot.
>
> Defining which casts are exact gives us a simple and precise
> semantics for `x
> instanceof T`: whether `x` can be cast exactly to `T`. Similarly,
> if the static
> type of `x` is not castable to `T`, then the corresponding
> `instanceof` question
> is rejected statically. The answers are not suprising:
>
> - Boxing is always exact;
> - Unboxing is exact for all non-null values;
> - Reference widening is always exact;
> - Reference narrowing is exact if the type of the target
> expression is a
> subtype of the target type;
> - Primitive widening and narrowing are exact if the target
> expression can be
> represented in the range of the target type.
>
> #### Primitive type patterns
>
> It is a short hop from `instanceof` to patterns (including
> primitive type
> patterns, and reference type patterns applied to primitive types),
> which can be
> defined entirely in terms of cast conversion and exactness:
>
> - A type pattern `T t` is applicable to a target of type `S` if
> `S` is
> cast-convertible to `T`;
> - A type pattern `T t` matches a target `x` if `x` can be cast
> exactly to `T`;
> - A type pattern `T t` is unconditional at type `S` if casting
> from `T` to `S`
> is unconditionally exact;
> - A type pattern `T t` dominates a type pattern `S s` (or a
> record pattern
> `S(...)`) if `T t` would be unconditional on `S`.
>
> While the rules for casting are complex, primitive patterns add no new
> complexity; there are no new conversions or conversion contexts.
> If we see:
>
> switch (a) {
> case T t: ...
> }
>
> we know the case matches if `a` can be cast exactly to `T`, and
> the pattern is
> unconditional if _all_ values of `a`'s type can be cast exactly to
> `T`. Note
> that none of this is specific to primitives; we derive the
> semantics of _all_
> type patterns from the enhanced definition of casting.
>
> Now, our record deconstruction examples work symmetrically to
> construction:
>
> case R(int i) // OK
> case R(short s) // test if `i` is in the range of `short`
> case R(Integer i) // box `i` to `Integer`
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20220908/6fcfa17c/attachment-0003.htm>
More information about the amber-spec-observers
mailing list