[External] : Re: Primitive type patterns

Thu Mar 3 15:17:00 UTC 2022

On Wed, Mar 2, 2022 at 3:13 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>
>
>
> On 3/2/2022 1:43 PM, Dan Heidinga wrote:
> >
> > Making the pattern match compatible with assignment conversions makes
> > sense to me and follows a similar rationale to that used with
> > MethodHandle::asType following the JLS 5.3 invocation conversions.
> > Though with MHs we had the ability to add additional conversions under
> > MethodHandles::explicitCastArguments. With pattern matching, we don't
> > have the same ability to make the "extra" behaviour opt-in / opt-out.
> > We just get one chance to pick the right behaviour.
>
> Indeed.  And the thing that I am trying to avoid here is creating _yet
> another_ new context in which a different bag of ad-hoc conversions are
> possible.  While it might be justifiable from a local perspective to say
> "its OK if `int x` does unboxing, but having it do range checking seems
> new and different, so let's not do that", from a global perspective,
> that means we a new context ("pattern match context") to add to
> assignment, loose invocation, strict invocation, cast, and numeric
> contexts.  That is the kind of incremental complexity I'd like to avoid,
> if there is a unifying move we can pull.

I'm in agreement on not adding new contexts but I had the opposite
impression here.  Doesn't "having it do range checking" require a new
context as this is different from what assignment contexts allow
today?  Or is it the case that regular, non-match assignment must be
total with no left over that allows them to use the same context
despite not being able to do the dynamic range check?  As this
sentence shows, I'm confused on how dynamic range checking fits in the
existing assignment context.

Or are we suggesting that assignment allows:

byte b = new Long(5);

to succeed if we can unbox + meet the dynamic range check?  I'm
clearly confused here.

> Conversions like unboxing or casting are burdened by the fact that they
> have to be total, which means the "does it fit" / "if so, do it" / "if
> not, do something else (truncate, throw, etc)" all have to be crammed
> into a single operation.  What pattern matching is extracts the "does it
> fit, and if so do it" into a more primitive operation, from which other
> operations can be composed.

Is it accurate to say this is less reusing assignment context and more
completely replacing it with a new pattern context from which
assignment can be built on top of?

> At some level, what I'm proposing is all spec-shuffling; we'll either
> say "a widening primitive conversion is allowed in assignment context",
> or we'll say that primitive `P p` matches any primitive type Q that can
> be widened to P.  We'll end up with a similar number of rules, but we
> might be able to "shake the box" to make them settle to a lower energy
> state, and be able to define (whether we explicitly do so or not)
> assignment context to support "all the cases where the LHS, viewed as a
> type pattern, are exhaustive on the RHS, potentially with remainder, and
> throws if remainder is encountered."  (That's what unboxing does; throws
> when remainder is encountered.)

Ok. So maybe I'm not confused.  We'd allow the `byte b = new Long(5);`
code to compile and throw not only on a failed unbox, but also on a
dynamic range check failure.

If we took this "dynamic hook" behaviour to the limit, what other new
capabilities does it unlock?  Is this the place to connect other
user-supplied conversion operations as well?  Maybe I'm running too
far with this idea but it seems like this could be laying the
groundwork for other interesting behaviours.  Am I way off in the
weeds here?

>
> As to the range check, it has always bugged me that you see code that
> looks like:
>
>      if (i >= -127 && i <= 128) { byte b = (byte) i; ... }
>
> because of the accidental specificity, and the attendant risk of error
> (using <= instead of <, or using 127 instead of 128). Being able to say:
>
>      if (i instanceof byte b) { ... }
>
> is better not because it is more compact, but because you're actually
> asking the right question -- "does this int value fit in a byte."  I'm
> sad we don't really have a way to ask this question today; it seems an
> omission.

I had been thinking about this when I wrote my response and I like
having the compiler generate the range check for me.  As you say, way
easier to avoid errors that way.

>
> > Intuitively, the behaviour you propose is kind of what we want - all
> > the possible byte cases end up in the byte case and we don't need to
> > adapt the long case to handle those that would have fit in a byte.
> > I'm slightly concerned that this changes Java's historical approach
> > and may lead to surprises when refactoring existing code that treats
> > unbox(Long) one way and unbox(Short) another.  Will users be confused
> > when the unbox(Long) in the short right range ends up in a case that
> > was only intended for unbox(Short)?  I'm having a hard time finding an
> > example that would trip on this but my lack of imagination isn't
> > definitive =)
>
> I'm worried about this too.  We examined it briefly, and ran away, when
> we were thinking about constant patterns, specifically:
>
>      Object o = ...
>      switch (o) {
>          case 0: ...
>          default: ...
>      }
>
> What would this mean?  What I wouldn't want it to mean is "match Long 0,
> Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the
> line for "magic".  (Note that this is about defining what the _constant
> pattern_ means, not the primitive type pattern.) I think its probably
> reasonable to say this is a type error; 0 is applicable to primitive
> numerics and their boxes, but not to Number or Object.  I think that is
> consistent with what I'm suggesting about primitive type patterns, but
> I'd have to think about it more.

Object o =...
switch(o) {
    case (long)0: ...  // can we say this?  Probably not
    case long l && l == 0: // otherwise this would become the way to
catch most of the constant 0 cases
    default: ....
}

I'm starting to think the constant pattern will feel less like magic
once the dynamic range checking becomes commonplace.

> > Something like following shouldn't be surprising given the existing
> > rules around unbox + widening primitive conversion (though it may be
> > when first encountered as I expect most users haven't really
> > internalized the JLS 5.2 rules):
>
> As Alex said to me yesterday: "JLS Ch 5 contains many more words than
> any prospective reader would expect to find on the subject, but once the
> reader gets over the overwhelm of how much there is to say, will find
> none of the words surprising."  There's a deeper truth to this
> statement: Java is not actually as simple a language as its mythology
> suggests, but we win by hiding the complexity in places users generally
> don't have to look, and if and when they do confront the complexity,
> they find it unsurprising, and go back to ignoring it.
>
> So in point of fact, *almost no one* has read JLS 5.2, but it still does
> "what users would likely find reasonable".
>
> > Number n = ....;
> > switch(n) {
> >    case long l -> ...
> >    case int i -> .... // dead code
> >    case byte b -> .... // dead code
> >    default -> ....
> > }
>
> Correct.  We have rules for pattern dominance, which are used to give
> compile errors on dead cases; we'd have to work through the details to
> confirm that `long l` dominates `int i`, but I'd hope this is the case.
>
> > But this may be more surprising as I suggested above
> >
> > Number n = new Long(5);
> > switch(n) {
> >    case byte b -> .... // matches here
> >    case int i -> .... //
> >    case long l -> ...
> >    default -> ....
> > }
> >
> > Overall, I like the extra dynamic range check but would be fine with
> > leaving it out if it complicates the spec given it feels like a pretty
> > deep-in-the-weeds corner case.
>
> It is probably not a forced move to support the richer interpretation of
> primitive patterns now.  But I think the consequence of doing so may be
> surprising: rather than "simplifying the language" (as one might hope
> that "leaving something out" would do), I think there's a risk that it
> makes things more complicated, because (a) it effectively creates yet
> another conversion context that is distinct from the too-many we have
> now, and (b) creates a sharp edge where refactoring from local variable
> initialization to let-bind doesn't work, because assignment would then
> be looser than let-bind.

Ok. You're saying that the dynamic range check is essential enough
that it's worth a new context for if we can't adjust the meaning of
assignment context.

>
> One reason this is especially undesirable is that one of the forms of
> let-bind is a let-bind *expression*:
>
>      let P = p, Q = q
>      in <expression>
>
> which is useful for pulling out subexpressions and binding them to a
> variable, but for which the scope of that variable is limited.  If
> refactoring from:
>

Possible typo in the example.  Attempted to fix:

>      int x = stuff;
>      m(f(x));
>
> to
>
>      m(let x = stuff in f(x))
>      // x no longer in scope here

Not sure I follow this example.  I'm not sure why introducing a new
variable in this scope is useful.

>
> was not possible because of a silly mismatch between the conversions in
> let context and the conversions in assignment context, then we're
> putting users in the position of having to choose between richer
> conversions and richer scoping.

Ok.  I think I see where this is going and while it may be clearer
with a larger example, I agree with the principle that this
refactoring should be possible.

--Dan

>
> (Usual warning (Remi): I'm mentioning let-expressions because it gives a
> sense of where some of these constraints come from, but this is not a
> suitable time to design the let-expression feature.)
>
>