Primitives in instanceof and patterns

Thu Sep 8 21:11:35 UTC 2022

> Additionally, we have to define which of these casts are "lossy" vs which
are "information preserving" (exact)

That was indeed what I was asking, and thank you for answering so
thoroughly!

> > Intuitively, I think I would expect that if t is an expression of
primitive type T, and U is also a primitive type, then t instanceof U is
true iff t == (T) (U) t. [...]
> This is a good intuition, and is true for some types, and almost true for
others, but it falls into some holes.  In particular, with some
conversions, `(U) t` is lossy but `(T) (U) t` is also lossy in an exactly
compensating way.

I see, yes. For example given
byte b = -1;
we have b == (byte) (char) b, but we wouldn't want b instanceof char to be
true.

On Thu, 8 Sept 2022 at 13:03, Brian Goetz <brian.goetz at oracle.com> wrote:

> Sigh, floating point.  Yes, this is the most difficult corner of this
> work.
>
> Bear in mind that you're really asking questions about cast conversion.
> Additionally, we have to define which of these casts are "lossy" vs which
> are "information preserving" (exact), which for most numbers is
> straightforward but for weird floating point values might be harder.   So
> let's first see what happens when we cast your examples:
>
> jshell> (int) 2.0f
> $1 ==> 2
>
> jshell> (int) 2.5f
> $2 ==> 2
>
> jshell> (float) Double.MAX_VALUE
> Infinity
>
> jshell> (float) Math.PI
> $4 ==> 3.1415927
>
> jshell> (float) Double.NaN
> $5 ==> NaN
>
> jshell> (double) Double.NaN
> $6 ==> NaN
>
> Clearly #2 (casting 2.5 to int) is lossy, and therefore would not be
> exact, so we can cross that one off the list.  Similar for Double.MAX_VALUE
> to float.  It's also pretty clear that Math.PI -- which is merely an alias
> for 3.14159265358979323846 -- is also not representable in the range of
> float.
>
> So that leaves:
>
>     2.0 instanceof int
>     Double.NaN instanceof float
>     Double.NaN instanceof double
>
> The first one is an exact cast; this can be specified in a number of ways,
> including "sufficient number of low order bits are zero", or "representable
> in the range of the target type", or others.  The binary representation of
> 2.0f is 01000000000000000000000000000000, which exactly encodes an
> integer.  Again, this is not a rule about `instanceof`; this is derived
> from casting (though we do have to define what exactness means.)
>
> For NaN, negative zero, and infinity, whether various conversions are
> exact or not may require a somewhat ad-hoc decision, but I think the
> intuitive answers for NaN is that Float.NaN <--> Double.NaN is exact, and
> similar for Float.Inf <--> Double.Inf.  There will be an argument about
> -0.0 and 0.0.
>
> Intuitively, I think I would expect that if t is an expression of
> primitive type T, and U is also a primitive type, then t instanceof U is
> true iff t == (T) (U) t. (Perhaps with a variant of == that is reflexive
> even for NaN, or perhaps not.) That would make (1) true, (2,3,4) false, and
> (5,6) who knows.
>
>
> This is a good intuition, and is true for some types, and almost true for
> others, but it falls into some holes.  In particular, with some
> conversions, `(U) t` is lossy but `(T) (U) t` is also lossy in an exactly
> compensating way.
>
> This makes a lot of sense. I'm wondering, though, how it works with float
> and double. I don't see the answer by looking at JLS Ch5. Are the following
> expressions legal, and if so which ones are true?
>
>    1. 2.0 instanceof int
>    2. 2.5 instanceof int
>    3. Double.MAX_VALUE instanceof float
>    4. Math.PI instanceof float
>    5. Double.NaN instanceof float
>    6. Double.NaN instanceof double
>
> Intuitively, I think I would expect that if t is an expression of
> primitive type T, and U is also a primitive type, then t instanceof U is
> true iff t == (T) (U) t. (Perhaps with a variant of == that is reflexive
> even for NaN, or perhaps not.) That would make (1) true, (2,3,4) false, and
> (5,6) who knows.
>
> On Thu, 8 Sept 2022 at 09:53, Brian Goetz <brian.goetz at oracle.com> wrote:
>
>> Earlier in the year we talked about primitive type patterns.  Let me
>> summarize
>> the past discussion, what I think the right direction is, and why this is
>> (yet
>> another) "finishing up the job" task for basic patterns that, if left
>> undone,
>> will be a sharp edge.
>>
>> Prior to record patterns, we didn't support primitive type patterns at
>> all. With
>> records, we now support primitive type patterns as nested patterns, but
>> they are
>> very limited; they are only applicable to exactly their own type.
>>
>> The motivation for "finishing" primitive type patterns is the same as
>> discussed
>> earlier this week with array patterns -- if pattern matching is the dual
>> of
>> aggregation, we want to avoid gratuitous asymmetries that let you put
>> things
>> together but not take them apart.
>>
>> Currently, we can assign a `String` to an `Object`, and recover the
>> `String`
>> with a pattern match:
>>
>>     Object o = "Bob";
>>     if (o instanceof String s) { println("Hi Bob"); }
>>
>> Analogously, we can assign an `int` to a `long`:
>>
>>     long n = 0;
>>
>> but we cannot yet recover the int with a pattern match:
>>
>>     if (n instanceof int i) { ... } // error, pattern `int i` not
>> applicable to `long`
>>
>> To fill out some more of the asymmetries around records if we don't
>> finish the job: given
>>
>>     record R(int i) { }
>>
>> we can construct it with
>>
>>     new R(anInt)     // no adaptation
>>     new R(aShort)    // widening
>>     new R(anInteger) // unboxing
>>
>> but yet cannot deconstruct it the same way:
>>
>>     case R(int i)     // OK
>>     case R(short s)   // nope
>>     case R(Integer i) // nope
>>
>> It would be a gratuitous asymmetry that we can use pattern matching to
>> recover from
>> reference widening, but not from primitive widening.  While many of the
>> arguments against doing primitive type patterns now were of the form
>> "let's keep
>> things simple", I believe that the simpler solution is actually to
>> _finish the
>> job_, because this minimizes asymmetries and potholes that users would
>> otherwise
>> have to maintain a mental catalog of.
>>
>> Our earlier explorations started (incorrectly, as it turned out), with
>> assignment context.  This direction gave us a good push in the right
>> direction,
>> but turned out to not be the right answer.  A more careful reading of JLS
>> Ch5
>> convinced me that the answer lies not in assignment conversion, but _cast
>> conversion_.
>>
>> #### Stepping back: instanceof
>>
>> The right place to start is actually not patterns, but `instanceof`.  If
>> we
>> start here, and listen carefully to the specification, it leads us to the
>> correct answer.
>>
>> Today, `instanceof` works only for reference types.  Accordingly, most
>> people
>> view `instanceof` as "the subtyping operator" -- because that's the only
>> question we can currently ask it.  We almost never see `instanceof` on
>> its own;
>> it is nearly always followed by a cast to the same type.  Similarly, we
>> rarely
>> see a cast on its own; it is nearly always preceded by an `instanceof`
>> for the
>> same type.
>>
>> There's a reason these two operations travel together: casting is, in
>> general,
>> unsafe; we can try to cast an `Object` reference to a `String`, but if the
>> reference refers to another type, the cast will fail.  So to make casting
>> safe,
>> we precede it with an `instanceof` test.  The semantics of `instanceof`
>> and
>> casting align such that `instanceof` is the precondition test for safe
>> casting.
>>
>> > instanceof is the precondition for safe casting
>>
>> Asking `instanceof T` means "if I cast this to T, would I like the
>> answer."
>> Obviously CCE is an unlikable answer; `instanceof` further adopts the
>> opinion
>> that casting `null` would also be an unlikable answer, because while the
>> cast
>> would succeed, you can't do anything useful with the result.
>>
>> Currently, `instanceof` is only defined on reference types, and on this
>> domain
>> coincides with subtyping.  On the other hand, casting is defined between
>> primitive types (widening, narrowing), and between primitive and
>> reference types
>> (boxing, unboxing).  Some casts involving primitives yield "better"
>> results than
>> others; casting `0` to `byte` results in no loss of information, since
>> `0` is
>> representable as a byte, but casting `500` to `byte` succeeds but loses
>> information because the higher order bits are discarded.
>>
>> If we characterize some casts as "lossy" and others as "exact" -- where
>> lossy
>> means discarding useful information -- we can extend the "safe casting
>> precondition" meaning of `instanceof` to primitive operands and types in
>> the
>> obvious way -- "would casting this expression to this type succeed
>> without error
>> and without information loss."  If the type of the expression is not
>> castable to
>> the type we are asking about, we know the cast cannot succeed and reject
>> the
>> `instanceof` test at compile time.
>>
>> Defining which casts are lossy and which are exact is fairly
>> straightforward; we
>> can appeal to the concept already in the JLS of "representable in the
>> range of a
>> type."  For some pairs of types, casting is always exact (e.g., casting
>> `int` to
>> `long` is always exact); we call these "unconditionally exact".  For
>> other pairs
>> of types, some values can be cast exactly and others cannot.
>>
>> Defining which casts are exact gives us a simple and precise semantics
>> for `x
>> instanceof T`: whether `x` can be cast exactly to `T`.  Similarly, if the
>> static
>> type of `x` is not castable to `T`, then the corresponding `instanceof`
>> question
>> is rejected statically.  The answers are not suprising:
>>
>>  - Boxing is always exact;
>>  - Unboxing is exact for all non-null values;
>>  - Reference widening is always exact;
>>  - Reference narrowing is exact if the type of the target expression is a
>>    subtype of the target type;
>>  - Primitive widening and narrowing are exact if the target expression
>> can be
>>    represented in the range of the target type.
>>
>> #### Primitive type patterns
>>
>> It is a short hop from `instanceof` to patterns (including primitive type
>> patterns, and reference type patterns applied to primitive types), which
>> can be
>> defined entirely in terms of cast conversion and exactness:
>>
>>  - A type pattern `T t` is applicable to a target of type `S` if `S` is
>>    cast-convertible to `T`;
>>  - A type pattern `T t` matches a target `x` if `x` can be cast exactly
>> to `T`;
>>  - A type pattern `T t` is unconditional at type `S` if casting from `T`
>> to `S`
>>    is unconditionally exact;
>>  - A type pattern `T t` dominates a type pattern `S s` (or a record
>> pattern
>>    `S(...)`) if `T t` would be unconditional on `S`.
>>
>> While the rules for casting are complex, primitive patterns add no new
>> complexity; there are no new conversions or conversion contexts.  If we
>> see:
>>
>>     switch (a) {
>>         case T t: ...
>>     }
>>
>> we know the case matches if `a` can be cast exactly to `T`, and the
>> pattern is
>> unconditional if _all_ values of `a`'s type can be cast exactly to `T`.
>> Note
>> that none of this is specific to primitives; we derive the semantics of
>> _all_
>> type patterns from the enhanced definition of casting.
>>
>> Now, our record deconstruction examples work symmetrically to
>> construction:
>>
>>     case R(int i)     // OK
>>     case R(short s)   // test if `i` is in the range of `short`
>>     case R(Integer i) // box `i` to `Integer`
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20220908/23f42250/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4003 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20220908/23f42250/smime-0001.p7s>