[External] : Re: Primitive type patterns

Brian Goetz brian.goetz at oracle.com
Sat Feb 26 16:49:08 UTC 2022


>
>     #### Relationship with assignment context
>
>
> That's a huge leap, let's take a step back.
>
> I see two questions that should be answered first.
> 1) do we really want pattern in case of assignment/declaration to 
> support assignment conversions ?
> 2) do we want patterns used by the switch or instanceof to follow the 
> exact same rules as patterns used in assignment/declaration ?

I agree we should take a step back, but let's take a step farther -- 
because I want to make an even bigger leap that you think :)

Stepping way far back .... in the beginning ... Java had reference types 
with subtyping, and eight primitive types.  Which raises an immediate 
question: what types can be assigned to what?  Java chose a sensible 
guideline; assignment should be allowed if the value set on the left is 
"bigger" than that on the right.  This gives us String => Object, int => 
long, int => double, etc.  (At this point, note that we've gone beyond 
strict value set inclusion; an int is *not* a floating point number, but 
we chose (reasonably) to do the conversion because we can *embed* the 
ints in the value set of double.   Java was already appealing to the 
notion of embedding-projection pair even then, in assignment 
conversions; assignment from A to B is OK if we have an embedding of A 
into B.)

On the other hand, Java won't let you assign long => int, because it 
might be a lossy conversion.  To opt into the loss, you have to cast, 
which acknowledges that the conversion may be information-losing.  
Except!  If you can prove the conversion isn't information losing 
(because the thing on the right is a compile-time constant), then its 
OK, because we know its safe.  JLS Ch5 had its share of ad-hoc-seeming 
complexity, but mostly stayed in its corner until you called it, and the 
rules all seemed justifiable.

Then we added autoboxing.  And boxing is not problematic; int embeds 
into Integer.  So the conversion from int => Integer is fine. (It added 
more complexity to overload selection, brought in strict and loose 
conversion contexts, and we're still paying when methods like 
remove(int) merge with remove(T), but OK.)  But the other direction is 
problematic; there is one value of Integer that doesn't correspond to 
any value of int, which is our favorite value, null. The decision made 
at the time was to allow the conversion from Integer => int, and throw 
on null.

This was again a justifiable choice, and comes from the fact that the 
mapping from Integer to int is a _projection_, not an embedding.  It was 
decided (reasonably, but we could have gone the other way too) that null 
was a "silly" enough value to justify not requiring a cast, and throwing 
if the silly value comes up.  We could have required a cast from Integer 
to int, as we do from long to int, and I can imagine the discussion 
about why that was not chosen.

Having set the stage, one can see all the concepts in pattern matching 
dancing on it, just with different names.

Whether we can assign T to U with or without a cast, is something we 
needed a static rule for.  So we took the set of type pairs (T, U) for 
which the pattern `T t` is strictly total on U, and said "these are the 
conversions allowed in assignment context" (with a special rule for when 
the target is an integer constant.)

When we got to autoboxing, we made a subjective call that `int x` should 
be "total enough" on `Integer` that we're willing to throw in the one 
place it's not.  That's exactly the concept of "P is exhaustive, but not 
total, on T" (i.e., there is a non-empty remainder.)  All of this has 
happened before.  All of this will happen again.

So the bigger leap I've got in mind is: what would James et al have 
done, had they had pattern matching from day one?  I believe that:

  - T t = u would be allowed if `T t` is exhaustive on the static type of u;
  - If there is remainder, assignment can throw (preserving the 
invariant that if the assignment completes normally, something was 
assigned).

So it's not that I want to align assignment with pattern matching 
because we've got a syntactic construct on the whiteboard that operates 
by pattern matching but happens to looks like assignment; it's because 
assignment *is* a constrained case of pattern matching.  We've found the 
missing primitive, and I want to put it under the edifice.  If we define 
pattern matching correctly, we could rewrite JLS 5.2 entirely in terms 
of pattern matching (whether we want to actually rewrite it or not, 
that's a separate story.)

The great thing about pattern matching as a generalization of assignment 
is that it takes pressure off the one-size-fits-all ruleset.  You can write:

     int x = anInteger

but it might throw NPE.  In many cases, users are fine with that. But by 
interpreting it as a pattern, when we get into more flexible constructs, 
we don't *have* to throw eagerly.  If the user said:

     if (anInteger instanceof int x) { ... }

then we match the pattern on everything but null, and don't match on 
null, and since instanceof is a conditional construct, no one needs to 
throw at all.  And if the user said:

     switch (anInteger) {
         case int x: ...
         // I can have more cases
     }

the `int x` case is taken for all values other than null, and the user 
has a choice to put more patterns that will catch up the remainder and 
act on them, or not; and if the user chooses "not", the remainder is 
implicitly rejected by switches handling of "exhaustive with remainder", 
but the timing of such is moved later, after the user has had as many 
bites at the apple as desired before we throw.  The rules about 
assignment are the way they are because there's no statically trackable 
side-channel for "was it a good match".  Now there is; let's use it.


So, enough philosophy; on to the specific objections.

> For 1, given that we are using pattern to do destructured assignment, 
> we may want to simplify the assignment rules to keep things simple 
> avoid users shooting themselves in the foot with implicit unboxing.
> With an example,
>   record Box<T>(T value) {}
>   Box<Integer> box = ...
>   Box<>(int result) = box;   // assignment of result may throw a NPE

I assume you mean "let Box... = box".   Assuming so, let's analyze the 
above.

The pattern Box(P) is exhaustive on Box<Integer> if P is exhaustive on 
Integer.  If we say that `int result` is exhaustive on Integer (which 
I'm proposing), then the remainder of `Box(int result)` will be { null, 
Box(null) }.  The pattern won't match the remainder (but *matching* does 
not throw), but the let/bind construct says "OK, if there was remainder, 
throw" (unless there's an else clause, yada yada.)  So yes, the above 
would throw (I don't think it should throw NPE, but we're going to 
discuss that in a separate thread) not because of the pattern -- pattern 
matching *never* throws -- but because the let construct wants 
exhaustiveness, and accepts that some patterns have remainder, and makes 
up the difference by completing abruptly.  It's just like:

     Box<Box<String>> bbs = new Box(null);
     let Box(Box(String s)) = bbs;

Here, Box(null) is in the remainder of Box(Box(String s)) on 
Box<Box<String>>, so the match fails, and the construct throws.  
Unboxing is not really any different, and I wouldn't want to treat them 
differently (and I worry that yet again, there's some "anti-null bias" 
going on.)  In both cases, we have a nested pattern that is exhaustive 
but has non-empty remainder.

> I don't think we have to support that implicit unboxing given that we 
> have a way to ask for an unboxing explicitly (once java.lang.Integer 
> have a de-constructor)
>
>   Box<>(Integer(int result)) = box;

This is just moving the remainder somewhere else; Box(null) is in the 
remainder of Box(Integer(BLAH)), since deconstructing the Integer 
requires invoking a deconstructor whose receiver would be null.  I think 
what you're reaching for here is "the user should have to explicitly 
indicate that the conversion might not succeed", which would be 
analogous to "the user should have to cast Integer to int".  But we've 
already made that decision; we don't require such a cast.

> I think we should not jump with the shark too soon here and ask 
> ourselves if we really want assignment conversions in case of 
> destructured assignment.

See above; I think this is the wrong question.  It is not a matter of 
"do we want assignment conversions in destructuring", it is "do we want 
to be able to *derive* the assignment conversions from pattern matching."

> 2) we already know that depending on the context (inside a switch, 
> inside a instanceof, inside an assignment) the rules for pattern are 
> not exactly the same.

OMG Remi, would you please stop repeating this incorrect claim.  The 
rules for pattern matching are exactly the same across contexts; the 
differences are that the contexts get to choose when to try to match, 
and what to do if nothing matches.

> So we may consider that in the assignment context, assignment 
> conversions apply while for a matching context, simpler rules apply.

We could of course say that; we could say that `int x` is simply *not 
applicable* to a target of type Integer.  We can discuss that, but I 
don't think its a simplification, though; I think its actually *more* 
complexity because it's yet another context with yet another subtly 
different set of rules.  One obvious consequence of that restriction 
would be that users cannot refactor

     Foo f = e

to

     let Foo f = e

to

     if (e instanceof Foo f) { ... }

for yet more accidental reasons.  Is this really making things simpler?

> Then the model you propose is too clever for me, the fact that
>   instanceof Point(double x, double y)
> has a different meaning depending if Point is declared like
>   record Point(double x, double y) { }
> or like this
>   record Point(Double x, Double y) { }
> is too much.

If this is your concern, then I think you are objecting to something 
much more fundamental than the semantics of primitive patterns; you're 
objecting to the concept of nesting partial patterns entirely.  Because 
you can make the same objection (and if I recall correctly, you have) when:

      record Point(Number x, Number y) { }

and you do

      case Point(Number x, Number y) { }
vs
      case Point(Integer x, Integer y) { }

In the case of `Point(Number x, Number y)`, it would be absurd to NPE 
when x==null; there's nothing in the language that says record 
components may not be null.  But in the case of Point(Integer x, Integer 
y) -- where we're nesting partial patterns inside the Point 
deconstruction pattern -- it would be similarly absurd to match x == 
null.  (We've been through this, please let's not rehash this.)  So I 
think your objection is not about null, but that you simply don't like 
that we have the same syntax for nesting a *total* pattern (Number x), 
which means "don't do anything, I'm just declaring a variable to receive 
the component", and nesting a partial pattern (Integer x), which means 
"make sure this sub-pattern matches before matching the composite."  
That is to say, maybe you'd prefer that

     case Point(Integer x, Integer y)

be a type error, and require the user say something like (to pick a 
random syntax)

     case Point(match Integer x, match Integer y)

so that the user is slapped in the face with the partiality of the 
nested pattern.

This is a valid concern -- sometimes people refer to this as 
action-at-a-distance -- that to know what a pattern means, you may have 
to peek at its declaration.  But I do not think such a choice would age 
very well; it is a sort of "training wheels" choice, which gives us some 
confidence when we are first learning to pattern match, and thereafter 
are just new sources of friction.

In any case, your objection seems more fundamental than whether `int x` 
should match Integer; it seems that you view pattern composition as 
inherently confusing, and therefore you want to reach for warning signs 
or seat belts.  But non-compositional deconstruction patterns would be 
pretty weak.

> The semantics of Java around null is already a giant landmine field, 
> we should restraint ourselves to add more null-funny behaviors.

I think we agree on the goal, but I think we may disagree on what 
"adding more null-funny behaviors" means.  What you describe as 
"disallow this case for simplicity", I see as "add a new null-funny 
behavior."


Your concerns are valid, and we should continue to discuss, but bear in 
mind that I think they mostly proceed from two places where we may 
continue to disagree:

  - You are generally much more inclined to say "if it might be null, 
disallow it / throw eagerly" than I am.  In general, I prefer to let the 
nulls flow until they hit a point where they can clearly flow no 
further, rather than introduce null gates into the middle of 
computations, because null gates are impediments to composition and 
refactoring.

  - You are viewing pattern matching as the "new thing", and trying to 
limit it to the cases where you're sure that users who are unfamiliar 
with it (which is almost all of them) will have a good initial 
experience.  (This is sort of a semantic analogue of Stroustrup's 
rule.)  But I believe those limitations, in the long run, will lead to a 
more complex language and a worse long-term experience.  I want to 
optimize for where we are going, which is that there is one set of rules 
for patterns people can reason about, even if they are a little 
complicated-seeming at first, rather than an ever-growing bag of 
individually "simple" restrictions.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20220226/a8539100/attachment-0001.htm>


More information about the amber-spec-experts mailing list