[External] : Re: Primitive type patterns
Brian Goetz
brian.goetz at oracle.com
Thu Mar 3 23:22:08 UTC 2022
I read JLS 5.2 more carefully and discovered that while assignment
context supports primitive narrowing from int-and-smaller to
smaller-than-that:
byte b = 0
it does not support primitive narrowing from long to int:
int x = 0L // error
My best guess at rationale is that because there is no suffix for
int/short/byte, then int literals are like "poly expressions" but long
literals are just long literals. That's an irritating asymmetry (but,
fixable.)
> In addition, if the expression is a constant expression (§15.29) of
> type byte, short,
> char, or int:
> • A narrowing primitive conversion may be used if the variable is of
> type byte,
> short, or char, and the value of the constant expression is
> representable in the
> type of the variable.
> • A narrowing primitive conversion followed by a boxing conversion may
> be used
> if the variable is of type Byte, Short, or Character, and the value of
> the constant
> expression is representable in the type byte, short, or char respectively.
On 3/3/2022 10:17 AM, Dan Heidinga wrote:
> On Wed, Mar 2, 2022 at 3:13 PM Brian Goetz<brian.goetz at oracle.com> wrote:
>>
>>
>> On 3/2/2022 1:43 PM, Dan Heidinga wrote:
>>> Making the pattern match compatible with assignment conversions makes
>>> sense to me and follows a similar rationale to that used with
>>> MethodHandle::asType following the JLS 5.3 invocation conversions.
>>> Though with MHs we had the ability to add additional conversions under
>>> MethodHandles::explicitCastArguments. With pattern matching, we don't
>>> have the same ability to make the "extra" behaviour opt-in / opt-out.
>>> We just get one chance to pick the right behaviour.
>> Indeed. And the thing that I am trying to avoid here is creating _yet
>> another_ new context in which a different bag of ad-hoc conversions are
>> possible. While it might be justifiable from a local perspective to say
>> "its OK if `int x` does unboxing, but having it do range checking seems
>> new and different, so let's not do that", from a global perspective,
>> that means we a new context ("pattern match context") to add to
>> assignment, loose invocation, strict invocation, cast, and numeric
>> contexts. That is the kind of incremental complexity I'd like to avoid,
>> if there is a unifying move we can pull.
> I'm in agreement on not adding new contexts but I had the opposite
> impression here. Doesn't "having it do range checking" require a new
> context as this is different from what assignment contexts allow
> today? Or is it the case that regular, non-match assignment must be
> total with no left over that allows them to use the same context
> despite not being able to do the dynamic range check? As this
> sentence shows, I'm confused on how dynamic range checking fits in the
> existing assignment context.
>
> Or are we suggesting that assignment allows:
>
> byte b = new Long(5);
>
> to succeed if we can unbox + meet the dynamic range check? I'm
> clearly confused here.
>
>> Conversions like unboxing or casting are burdened by the fact that they
>> have to be total, which means the "does it fit" / "if so, do it" / "if
>> not, do something else (truncate, throw, etc)" all have to be crammed
>> into a single operation. What pattern matching is extracts the "does it
>> fit, and if so do it" into a more primitive operation, from which other
>> operations can be composed.
> Is it accurate to say this is less reusing assignment context and more
> completely replacing it with a new pattern context from which
> assignment can be built on top of?
>
>> At some level, what I'm proposing is all spec-shuffling; we'll either
>> say "a widening primitive conversion is allowed in assignment context",
>> or we'll say that primitive `P p` matches any primitive type Q that can
>> be widened to P. We'll end up with a similar number of rules, but we
>> might be able to "shake the box" to make them settle to a lower energy
>> state, and be able to define (whether we explicitly do so or not)
>> assignment context to support "all the cases where the LHS, viewed as a
>> type pattern, are exhaustive on the RHS, potentially with remainder, and
>> throws if remainder is encountered." (That's what unboxing does; throws
>> when remainder is encountered.)
> Ok. So maybe I'm not confused. We'd allow the `byte b = new Long(5);`
> code to compile and throw not only on a failed unbox, but also on a
> dynamic range check failure.
>
> If we took this "dynamic hook" behaviour to the limit, what other new
> capabilities does it unlock? Is this the place to connect other
> user-supplied conversion operations as well? Maybe I'm running too
> far with this idea but it seems like this could be laying the
> groundwork for other interesting behaviours. Am I way off in the
> weeds here?
>
>> As to the range check, it has always bugged me that you see code that
>> looks like:
>>
>> if (i >= -127 && i <= 128) { byte b = (byte) i; ... }
>>
>> because of the accidental specificity, and the attendant risk of error
>> (using <= instead of <, or using 127 instead of 128). Being able to say:
>>
>> if (i instanceof byte b) { ... }
>>
>> is better not because it is more compact, but because you're actually
>> asking the right question -- "does this int value fit in a byte." I'm
>> sad we don't really have a way to ask this question today; it seems an
>> omission.
> I had been thinking about this when I wrote my response and I like
> having the compiler generate the range check for me. As you say, way
> easier to avoid errors that way.
>
>>> Intuitively, the behaviour you propose is kind of what we want - all
>>> the possible byte cases end up in the byte case and we don't need to
>>> adapt the long case to handle those that would have fit in a byte.
>>> I'm slightly concerned that this changes Java's historical approach
>>> and may lead to surprises when refactoring existing code that treats
>>> unbox(Long) one way and unbox(Short) another. Will users be confused
>>> when the unbox(Long) in the short right range ends up in a case that
>>> was only intended for unbox(Short)? I'm having a hard time finding an
>>> example that would trip on this but my lack of imagination isn't
>>> definitive =)
>> I'm worried about this too. We examined it briefly, and ran away, when
>> we were thinking about constant patterns, specifically:
>>
>> Object o = ...
>> switch (o) {
>> case 0: ...
>> default: ...
>> }
>>
>> What would this mean? What I wouldn't want it to mean is "match Long 0,
>> Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the
>> line for "magic". (Note that this is about defining what the _constant
>> pattern_ means, not the primitive type pattern.) I think its probably
>> reasonable to say this is a type error; 0 is applicable to primitive
>> numerics and their boxes, but not to Number or Object. I think that is
>> consistent with what I'm suggesting about primitive type patterns, but
>> I'd have to think about it more.
> Object o =...
> switch(o) {
> case (long)0: ... // can we say this? Probably not
> case long l && l == 0: // otherwise this would become the way to
> catch most of the constant 0 cases
> default: ....
> }
>
> I'm starting to think the constant pattern will feel less like magic
> once the dynamic range checking becomes commonplace.
>
>
>>> Something like following shouldn't be surprising given the existing
>>> rules around unbox + widening primitive conversion (though it may be
>>> when first encountered as I expect most users haven't really
>>> internalized the JLS 5.2 rules):
>> As Alex said to me yesterday: "JLS Ch 5 contains many more words than
>> any prospective reader would expect to find on the subject, but once the
>> reader gets over the overwhelm of how much there is to say, will find
>> none of the words surprising." There's a deeper truth to this
>> statement: Java is not actually as simple a language as its mythology
>> suggests, but we win by hiding the complexity in places users generally
>> don't have to look, and if and when they do confront the complexity,
>> they find it unsurprising, and go back to ignoring it.
>>
>> So in point of fact, *almost no one* has read JLS 5.2, but it still does
>> "what users would likely find reasonable".
>>
>>> Number n = ....;
>>> switch(n) {
>>> case long l -> ...
>>> case int i -> .... // dead code
>>> case byte b -> .... // dead code
>>> default -> ....
>>> }
>> Correct. We have rules for pattern dominance, which are used to give
>> compile errors on dead cases; we'd have to work through the details to
>> confirm that `long l` dominates `int i`, but I'd hope this is the case.
>>
>>> But this may be more surprising as I suggested above
>>>
>>> Number n = new Long(5);
>>> switch(n) {
>>> case byte b -> .... // matches here
>>> case int i -> .... //
>>> case long l -> ...
>>> default -> ....
>>> }
>>>
>>> Overall, I like the extra dynamic range check but would be fine with
>>> leaving it out if it complicates the spec given it feels like a pretty
>>> deep-in-the-weeds corner case.
>> It is probably not a forced move to support the richer interpretation of
>> primitive patterns now. But I think the consequence of doing so may be
>> surprising: rather than "simplifying the language" (as one might hope
>> that "leaving something out" would do), I think there's a risk that it
>> makes things more complicated, because (a) it effectively creates yet
>> another conversion context that is distinct from the too-many we have
>> now, and (b) creates a sharp edge where refactoring from local variable
>> initialization to let-bind doesn't work, because assignment would then
>> be looser than let-bind.
> Ok. You're saying that the dynamic range check is essential enough
> that it's worth a new context for if we can't adjust the meaning of
> assignment context.
>
>> One reason this is especially undesirable is that one of the forms of
>> let-bind is a let-bind *expression*:
>>
>> let P = p, Q = q
>> in <expression>
>>
>> which is useful for pulling out subexpressions and binding them to a
>> variable, but for which the scope of that variable is limited. If
>> refactoring from:
>>
> Possible typo in the example. Attempted to fix:
>
>> int x = stuff;
>> m(f(x));
>>
>> to
>>
>> m(let x = stuff in f(x))
>> // x no longer in scope here
> Not sure I follow this example. I'm not sure why introducing a new
> variable in this scope is useful.
>
>> was not possible because of a silly mismatch between the conversions in
>> let context and the conversions in assignment context, then we're
>> putting users in the position of having to choose between richer
>> conversions and richer scoping.
> Ok. I think I see where this is going and while it may be clearer
> with a larger example, I agree with the principle that this
> refactoring should be possible.
>
> --Dan
>
>> (Usual warning (Remi): I'm mentioning let-expressions because it gives a
>> sense of where some of these constraints come from, but this is not a
>> suitable time to design the let-expression feature.)
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20220303/000cc535/attachment-0001.htm>
More information about the amber-spec-experts
mailing list