[External] : Re: Primitive type patterns

Wed Mar 2 20:13:30 UTC 2022

On 3/2/2022 1:43 PM, Dan Heidinga wrote:
>
> Making the pattern match compatible with assignment conversions makes
> sense to me and follows a similar rationale to that used with
> MethodHandle::asType following the JLS 5.3 invocation conversions.
> Though with MHs we had the ability to add additional conversions under
> MethodHandles::explicitCastArguments. With pattern matching, we don't
> have the same ability to make the "extra" behaviour opt-in / opt-out.
> We just get one chance to pick the right behaviour.

Indeed.  And the thing that I am trying to avoid here is creating _yet 
another_ new context in which a different bag of ad-hoc conversions are 
possible.  While it might be justifiable from a local perspective to say 
"its OK if `int x` does unboxing, but having it do range checking seems 
new and different, so let's not do that", from a global perspective, 
that means we a new context ("pattern match context") to add to 
assignment, loose invocation, strict invocation, cast, and numeric 
contexts.  That is the kind of incremental complexity I'd like to avoid, 
if there is a unifying move we can pull.

Conversions like unboxing or casting are burdened by the fact that they 
have to be total, which means the "does it fit" / "if so, do it" / "if 
not, do something else (truncate, throw, etc)" all have to be crammed 
into a single operation.  What pattern matching is extracts the "does it 
fit, and if so do it" into a more primitive operation, from which other 
operations can be composed.

At some level, what I'm proposing is all spec-shuffling; we'll either 
say "a widening primitive conversion is allowed in assignment context", 
or we'll say that primitive `P p` matches any primitive type Q that can 
be widened to P.  We'll end up with a similar number of rules, but we 
might be able to "shake the box" to make them settle to a lower energy 
state, and be able to define (whether we explicitly do so or not) 
assignment context to support "all the cases where the LHS, viewed as a 
type pattern, are exhaustive on the RHS, potentially with remainder, and 
throws if remainder is encountered."  (That's what unboxing does; throws 
when remainder is encountered.)

As to the range check, it has always bugged me that you see code that 
looks like:

     if (i >= -127 && i <= 128) { byte b = (byte) i; ... }

because of the accidental specificity, and the attendant risk of error 
(using <= instead of <, or using 127 instead of 128). Being able to say:

     if (i instanceof byte b) { ... }

is better not because it is more compact, but because you're actually 
asking the right question -- "does this int value fit in a byte."  I'm 
sad we don't really have a way to ask this question today; it seems an 
omission.

> Intuitively, the behaviour you propose is kind of what we want - all
> the possible byte cases end up in the byte case and we don't need to
> adapt the long case to handle those that would have fit in a byte.
> I'm slightly concerned that this changes Java's historical approach
> and may lead to surprises when refactoring existing code that treats
> unbox(Long) one way and unbox(Short) another.  Will users be confused
> when the unbox(Long) in the short right range ends up in a case that
> was only intended for unbox(Short)?  I'm having a hard time finding an
> example that would trip on this but my lack of imagination isn't
> definitive =)

I'm worried about this too.  We examined it briefly, and ran away, when 
we were thinking about constant patterns, specifically:

     Object o = ...
     switch (o) {
         case 0: ...
         default: ...
     }

What would this mean?  What I wouldn't want it to mean is "match Long 0, 
Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the 
line for "magic".  (Note that this is about defining what the _constant 
pattern_ means, not the primitive type pattern.) I think its probably 
reasonable to say this is a type error; 0 is applicable to primitive 
numerics and their boxes, but not to Number or Object.  I think that is 
consistent with what I'm suggesting about primitive type patterns, but 
I'd have to think about it more.

> Something like following shouldn't be surprising given the existing
> rules around unbox + widening primitive conversion (though it may be
> when first encountered as I expect most users haven't really
> internalized the JLS 5.2 rules):

As Alex said to me yesterday: "JLS Ch 5 contains many more words than 
any prospective reader would expect to find on the subject, but once the 
reader gets over the overwhelm of how much there is to say, will find 
none of the words surprising."  There's a deeper truth to this 
statement: Java is not actually as simple a language as its mythology 
suggests, but we win by hiding the complexity in places users generally 
don't have to look, and if and when they do confront the complexity, 
they find it unsurprising, and go back to ignoring it.

So in point of fact, *almost no one* has read JLS 5.2, but it still does 
"what users would likely find reasonable".

> Number n = ....;
> switch(n) {
>    case long l -> ...
>    case int i -> .... // dead code
>    case byte b -> .... // dead code
>    default -> ....
> }

Correct.  We have rules for pattern dominance, which are used to give 
compile errors on dead cases; we'd have to work through the details to 
confirm that `long l` dominates `int i`, but I'd hope this is the case.

> But this may be more surprising as I suggested above
>
> Number n = new Long(5);
> switch(n) {
>    case byte b -> .... // matches here
>    case int i -> .... //
>    case long l -> ...
>    default -> ....
> }
>
> Overall, I like the extra dynamic range check but would be fine with
> leaving it out if it complicates the spec given it feels like a pretty
> deep-in-the-weeds corner case.

It is probably not a forced move to support the richer interpretation of 
primitive patterns now.  But I think the consequence of doing so may be 
surprising: rather than "simplifying the language" (as one might hope 
that "leaving something out" would do), I think there's a risk that it 
makes things more complicated, because (a) it effectively creates yet 
another conversion context that is distinct from the too-many we have 
now, and (b) creates a sharp edge where refactoring from local variable 
initialization to let-bind doesn't work, because assignment would then 
be looser than let-bind.

One reason this is especially undesirable is that one of the forms of 
let-bind is a let-bind *expression*:

     let P = p, Q = q
     in <expression>

which is useful for pulling out subexpressions and binding them to a 
variable, but for which the scope of that variable is limited.  If 
refactoring from:

     int x = stuff;
     m(f(stuff));

to

     m(let x = stuff in f(stuff))
     // x no longer in scope here

was not possible because of a silly mismatch between the conversions in 
let context and the conversions in assignment context, then we're 
putting users in the position of having to choose between richer 
conversions and richer scoping.

(Usual warning (Remi): I'm mentioning let-expressions because it gives a 
sense of where some of these constraints come from, but this is not a 
suitable time to design the let-expression feature.)