[External] : Re: Primitive type patterns

Thu Mar 3 16:09:29 UTC 2022

> I'm in agreement on not adding new contexts but I had the opposite
> impression here.  Doesn't "having it do range checking" require a new
> context as this is different from what assignment contexts allow
> today?  Or is it the case that regular, non-match assignment must be
> total with no left over that allows them to use the same context
> despite not being able to do the dynamic range check?  As this
> sentence shows, I'm confused on how dynamic range checking fits in the
> existing assignment context.
>
> Or are we suggesting that assignment allows:
>
> byte b = new Long(5);
>
> to succeed if we can unbox + meet the dynamic range check?  I'm
> clearly confused here.

At a meta level, the alignment target is:
    - given a target type `T`
    - given an expression `e : E`

then:
   - being able to statically determine whether `T t` matches `e` should 
be equivalent to whether the assignment `T t = e` is valid under the 
existing 5.2 rules.

That is to say, the existing 5.2 rules may look like a bag of ad-hoc, 
two-for-one-on-tuesday rules, but really, they will be revealed to be 
the set of conversions that are consistent with statically determining 
whether `T t` matches `e : E`.  Most of these rules involve only T and E 
(e.g., widening primitive conversion), but one of them is about ranges, 
which we can only statically assess when `e` is a constant.

>
>> Conversions like unboxing or casting are burdened by the fact that they
>> have to be total, which means the "does it fit" / "if so, do it" / "if
>> not, do something else (truncate, throw, etc)" all have to be crammed
>> into a single operation.  What pattern matching is extracts the "does it
>> fit, and if so do it" into a more primitive operation, from which other
>> operations can be composed.
> Is it accurate to say this is less reusing assignment context and more
> completely replacing it with a new pattern context from which
> assignment can be built on top of?

Yes!  Ideally, this is one of those "jack up the house and provide a 
solid foundation" moves.

>
>> At some level, what I'm proposing is all spec-shuffling; we'll either
>> say "a widening primitive conversion is allowed in assignment context",
>> or we'll say that primitive `P p` matches any primitive type Q that can
>> be widened to P.  We'll end up with a similar number of rules, but we
>> might be able to "shake the box" to make them settle to a lower energy
>> state, and be able to define (whether we explicitly do so or not)
>> assignment context to support "all the cases where the LHS, viewed as a
>> type pattern, are exhaustive on the RHS, potentially with remainder, and
>> throws if remainder is encountered."  (That's what unboxing does; throws
>> when remainder is encountered.)
> Ok. So maybe I'm not confused.  We'd allow the `byte b = new Long(5);`
> code to compile and throw not only on a failed unbox, but also on a
> dynamic range check failure.

No ;)

Today, we would disallow this assignment because it is not an unboxing 
followed by a primitive widening.  (The opposite, long l = new Byte(3), 
would be allowed today, except that we took away these constructors so 
you have to use valueOf.)  We would only allow a narrowing if the RHS 
were a constant, like "5", in which case the compiler would statically 
evaluate the range check and narrow 5 to byte.

Tomorrow, the assignment would be the same; assignment works based on 
"statically determined to match", and we can only statically determine 
the range check if we know the target value, i.e., its a constant.  But, 
if you *asked*, then you can get a dynamic range check:

     if (anInt matches byte b) // we get a range check here

The reason we don't do that with assignment is we don't know what to do 
if it doesn't match.  But if its in a conditional context (if or 
switch), then the programmer is going to tell us what to do if it 
doesn't match.

> If we took this "dynamic hook" behaviour to the limit, what other new
> capabilities does it unlock?  Is this the place to connect other
> user-supplied conversion operations as well?  Maybe I'm running too
> far with this idea but it seems like this could be laying the
> groundwork for other interesting behaviours.  Am I way off in the
> weeds here?

Not entirely in the weeds.  The problem with assignment, casting, and 
all of those things is that they have to be total; when you say "x = y" 
then the guarantee is that *something* got assigned to x. Now, we are 
already cheating a bit, because `x = y` allows unboxing, and unboxing 
can throw.  (Sounds like remainder rejection!)   Now, imagine we had an 
"assign or else" construct (with static types A and B):

     a := (b, e)

then this would mean

     if (b matches A aa)
         a = aa
     else
         a = e  // and maybe e is really a function of b

In the case of unboxing conversions, our existing assignment works kind 
of like:

     a := (b, throw new NPE)

because we'd try to match, and if it fails, evaluate the second 
component, which throws.

Obviously I'm not suggesting we tinker with assignment in this way, but 
the point is: pattern matching gives you a chance to stop and say: 
"don't do it yet, but if you did it, would it work?"

>
>>
>>> Intuitively, the behaviour you propose is kind of what we want - all
>>> the possible byte cases end up in the byte case and we don't need to
>>> adapt the long case to handle those that would have fit in a byte.
>>> I'm slightly concerned that this changes Java's historical approach
>>> and may lead to surprises when refactoring existing code that treats
>>> unbox(Long) one way and unbox(Short) another.  Will users be confused
>>> when the unbox(Long) in the short right range ends up in a case that
>>> was only intended for unbox(Short)?  I'm having a hard time finding an
>>> example that would trip on this but my lack of imagination isn't
>>> definitive =)
>> I'm worried about this too.  We examined it briefly, and ran away, when
>> we were thinking about constant patterns, specifically:
>>
>>       Object o = ...
>>       switch (o) {
>>           case 0: ...
>>           default: ...
>>       }
>>
>> What would this mean?  What I wouldn't want it to mean is "match Long 0,
>> Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the
>> line for "magic".  (Note that this is about defining what the _constant
>> pattern_ means, not the primitive type pattern.) I think its probably
>> reasonable to say this is a type error; 0 is applicable to primitive
>> numerics and their boxes, but not to Number or Object.  I think that is
>> consistent with what I'm suggesting about primitive type patterns, but
>> I'd have to think about it more.
> Object o =...
> switch(o) {
>      case (long)0: ...  // can we say this?  Probably not
>      case long l && l == 0: // otherwise this would become the way to
> catch most of the constant 0 cases
>      default: ....
> }
>
> I'm starting to think the constant pattern will feel less like magic
> once the dynamic range checking becomes commonplace.

Probably can't say `case (long) 0`, but you can say `case 0L`. Though we 
don't have suffixes for all the types.

>
>> One reason this is especially undesirable is that one of the forms of
>> let-bind is a let-bind *expression*:
>>
>>       let P = p, Q = q
>>       in <expression>
>>
>> which is useful for pulling out subexpressions and binding them to a
>> variable, but for which the scope of that variable is limited.  If
>> refactoring from:
>>
> Possible typo in the example.  Attempted to fix:
>
>>       int x = stuff;
>>       m(f(x));
>>
>> to
>>
>>       m(let x = stuff in f(x))
>>       // x no longer in scope here
> Not sure I follow this example.  I'm not sure why introducing a new
> variable in this scope is useful.

Two reasons: narrower scope for locals, and turning statements into 
expressions.

A common expression with redundant subexpressions is "last 3 characters 
of string":

     last3 = s.substring(s.length() - 3, s.length())

We can refactor to

     int sLen = s.length();
     last3 = s.substring(sLen - 3, sLen);

but some people dislike this because now the rest of the scope is 
"polluted" with a garbage variable.  A let expression narrows the scope 
of sLen:

     last3 = let sLen = s.length()
                 in s.substring(sLen - 3, sLen);

This becomes more important when we want to use the result in, say, a 
method call; now we have to unroll the declaration of any helper 
statements (e.g., `int sLen = s.length()`) to outside the method call.  
A similar thing happens when we want to create an object, mutate it, and 
return it; this often requires statements, but a let expression turns it 
back into an expression.