Draft JEP on Primitive types in patterns, instanceof, and switch
Brian Goetz
brian.goetz at oracle.com
Fri Jan 27 14:53:43 UTC 2023
> In this context, `42` is
> *not* an `int` at all - it is a literal. There is no conversion here,
I don't really want to pile on because Ron and Tagir have already made
it clear that this is simply a misunderstanding of how the language
works, but there's a deeper point here that makes it useful to dig in a
little bit further, so I ask some forbearance. First, some necessary
chapter and verse:
JLS 3.10 defines literals; integer literals are divided into decimal,
hex, octal, and binary integer literals. The spec is quite clear that
literals like 42 *are*, in fact, integers:
> An integer literal is of type long if it is suffixed with an ASCII
> letter L or l (ell);
> otherwise it is of type int (§4.2.1).
There are no "untyped literals", nor are there literals of type byte,
short, or char. (Fun fact: there are also no instructions in the JVM
for arithmetic on byte, short, or char; these are done with `iadd` and
friends, and the shorter types are _erased_ to int. Erasure is not just
for generics.)
JLS 5 (which is about conversions) then goes on to define primitive
widening and narrowing conversions, and state when these conversions can
be applied. In an assignment context (JLS 5.2), a special case for
narrowing constant integral expressions (which includes integer
literals) to shorter integral types is permitted:
> In addition, if the expression is a constant expression (§15.29) of
> type byte, short,
> char, or int:
> • A narrowing primitive conversion may be used if the variable is of
> type byte,
> short, or char, and the value of the constant expression is
> representable in the
> type of the variable.
This is why `byte b = 0` works; the RHS is a constant expression of type
`int`, and 0 is known by the compiler to be representable in the type
`byte`. (This same language -- a value being representable in a given
type -- is also the language used in this JEP for when a cast is exact.)
So yes, there is a conversion here (JLS 5 is called "conversions and
contexts"). There is similar language that permits, for example,
integer case labels to be used in a switch on byte, short, or char (or
their box types.)
Now, let's think about what Java would be like without this phrase in
JLS 5.2; we'd have to cast 0 to byte every time we use a literal (or the
language would have to have separate syntax for byte literals). That's
kind of annoying; this is the spec working for you so you don't have to
deal with, or even notice, these low-level annoyances.
My point is not to say "see, haha, Stephen doesn't understand Java" --
quite the opposite. Stephen is an accomplished Java programmer, who has
written excellent Java libraries that we all use every day. My point is
that you can be an excellent Java programmer *without fully
understanding how the language works* -- and that's a feature, not a
bug! The essence of Java's "blue collar" success is not that the
language is so simple that everyone can read and understand the spec in
an afternoon. Java is in fact quite complex, but the spec is so
carefully constructed that most developers can go an entire career
without opening the spec at all, and much of the complexity stays in the
shadows. The spec goes to great lengths so that programmers like
Stephen can enjoy mental models like "42 is not an int, its a literal",
and have that not work against them most of the time. It doesn't matter
whether these mental models are 100% accurate; they are still useful.
Where mental models become problematic is when they leave your own mind
and you try to treat them as some sort of law of physics (especially
when they're wrong.) Mental models are useful, helpful approximations.
"Instanceof means subtyping" is a mental model, just like "0 is not an
integer, its a literal." If they help you, great, but be very careful
about extrapolating past the boundary of your own skull.
The changes in this JEP are in the same spirit as the rules about
narrowing integer constants; it is about _removing_ anomalies that would
make construction and deconstruction asymmetric for gratuitous reasons,
just like the rule about narrowing integer constants to byte. We don't
notice it because its working quietly for us. No one clamored for it --
because they never needed to. If we didn't align the meaning of type
patterns with existing conversions, as people used more complex nested
patterns, they'd notice anomalies that are analogous to "can't assign 0
to byte".
I get that this seems like a big change, but really, it's not. I doubt
people will all of a sudden start using switches on floats all over the
place, but it would be weird to be able to switch on every type *but*
float. The cure for that is to ensure that switching, instanceof, and
pattern matching work on every type, whose semantics are drawn from a
single, common source. As it turns out, that source is in plain view --
casting. We just have to let it out.
On 1/27/2023 1:22 AM, Tagir Valeev wrote:
> Hello!
>
> On Thu, Jan 26, 2023, 23:01 Stephen Colebourne <scolebourne at joda.org>
> wrote:
>
>
>
> In the Motivation section it is claimed that because `byte b = 42`
> compiles, it implies that sometimes an `int` can be converted to a
> `byte` without a cast. This is nonsense. In this context, `42` is
> *not* an `int` at all - it is a literal. There is no conversion here,
> `42` is typed as a `byte` because of the assignment. `42` is never, at
> any stage, an `int`. At the very least, the JEP should be amended to
> remove this part of the Motivation section.
>
>
> A small correction: this is not what spec says. 42 is accepted here
> not because it's untyped literal but because it's a compile time
> constant expression that fits the byte type. Other constant
> expressions are also accepted here, like:
>
> byte b = 21 + 21; // obviously not literal
> Or even:
> final int x = 21;
> final int y = 21; // int type is even spelled explicitly
> byte b = x + y;
>
> With best regards,
> Tagir Valeev
>
>
> The document proceeds to argue that because switch works with Object
> hierarchies:
> Pet p = new Pet(new Dog()); // automatic widening conversion from
> Dog to Animal
> switch (p) {
> case Pet(Dog d) -> ... d ...
> case Pet(Animal a) -> ... a ...
> default -> ...
> }
> that it must therefore be OK to work with primitives:
> int i = methodReturningShort();
> switch (i) {
> case byte b -> ... b ...;
> case float f -> ... f ...;
> default -> -1;
> }
>
> There is simply no comparison here. Dog and Animal are subtypes of
> Pet, a concept that has been baked into the language since day one,
> and is fully understood by all. By contrast, there is absolutely no
> subtyping relationship between int, byte and short, and again this
> fact has been baked into the language since day one.
>
> There is a *huge* red line being crossed here. Values in Java have two
> distinct parts - the type and the value of the instance. Java has
> always kept these two things completely separate: General-purpose
> language features operate on types and references/null (eg.
> instanceof, catch, switch) or expressions (if, for, while). It
> requires an expression or an operator for the actual value of the
> instance to be considered. The JEP proposes to shatter that boundary,
> saying that the language should now examine not only the type but also
> the value of the instance in order to determine flow control.
>
> The root cause of the issue here is trying to treat Object hierarchy
> conversion and conversion between different primitive types as being
> somehow equivalent. They are not. Java does not have a mechanism that
> allows a LocalDate to be assigned to a String, even though there is a
> perfectly reasonable way to do so. Instead, you have to explicitly
> perform the conversion by calling toString(). That is because
> `LocalDate` and `String` are separate types with no subtype hierarchy.
> Similarly, there is no subtype hierarchy link between `int` and
> `long`. That an `int` can be assigned to a `long` is merely a
> convenience - it could have required a method call. Critically though,
> the convenience conversion is absolute. No runtime check of the value
> of the instance is required. This even applies when converting `int`
> to `float` which is lossy.
>
> In my view, the only pattern matching checks that make sense here are
> those in line with the separation between types and values of
> instances. This basic rule implies:
> * `int` vs `Integer` and vice versa - OK, as only requires examination
> of the type and reference/null
> * `int` vs `byte` - not OK, as an expression is required in order to
> extract the value of the `int` in order to decide flow control
>
> In summary, if this was simply about adding a niche feature for
> checking whether an `int` actually fits in a `byte` I would have no
> problem. For example, were the pattern match to be based on an
> expression (ie. a method call) then I would have no problem. The key
> issue here is the red line being crossed by having a general-purpose
> language feature examine the value of the instance outside of an
> expression.
>
> thanks
> Stephen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20230127/3997c21c/attachment-0001.htm>
More information about the amber-dev
mailing list