Fwd: On range checks in primitive patterns

Mon Nov 28 16:03:07 UTC 2022

This was received on the amber-spec-comments list.  Redirecting to amber-dev as the answers are broadly useful (but as always when bringing these topics to amber-dev, please exercise restraint in replies.)  Full message below the line; pull-quotes inline.

A number of people seem a little intimidated by the proposed semantics of primitive patterns.  These concerns are valid, and the author here is genuinely trying to work through what this new language means.  And it is really, really easy to get caught up on the local details, and wonder what it is we are trying to accomplish or why we would have chosen these semantics.

First, I’d like to blow up one of the myths holding up many of the concerns that have been raised: that somehow this feature is about “range checks.”   This is an easy conclusion to come to, looking at the examples, and it is a short hop from there to wonder “but does the language really need a range check feature?”  But that is not remotely the purpose of this feature, it just falls out as one of its applications.

The author raises the following concern:

However, I think the following is a problematic aspect of this model: There is no syntactic difference between a pattern that perform only a variable binding, and a pattern that performs a range check.  For example, the following pattern perform a range check if the record component is declared to be `long` but not otherwise:

```
switch (o) {
    case SomeRecord(int i) -> System.out.println("Match");
    ...
}
```

This is a valid concern, but has nothing to do with primitive patterns!  This is a fundamental aspect of *nested* patterns; some patterns are total and some entail runtime checks, and they may look the same syntactically.  (Indeed, people raised similar concerns when we talked about record patterns, specifically about the not-initially-obvious treatment of null in type patterns.)  The author’s example could be restated in terms of reference types, and replacing “range check” with “null check” or “type check":

    switch (o) {
        Case SomeRecord(String s) -> println(“match”);
    }

Here, as with the OPs example, whether or not there is a runtime check depends on the type of the component of SomeRecord.  If SomeRecord’s component is String, there is no runtime check, and even SomeRecord(null) matches; if it is some other type (like Object), a type test is performed, which rejects non-Strings and rejects nulls.

The same is true with the primitive pattern; if the type in the pattern declaration and the type at the pattern use site are the same (actually, if one covers the other), then the nested pattern is unconditional and no runtime checks are required, otherwise it is conditional and runtime checks are performed.

I realize some people dislike or misunderstand (or both) this semantics; it has an “action at a distance” that makes some people uncomfortable.   (NB: This is not an invitation to reopen this topic, it has been beaten to death.).  My point is not “well, we already commit that sin once, so twice is OK.”  My point is a deeper one: we don’t want “reference type patterns” and “primitive type patterns”, we want “type patterns”.

This has the following consequences:

As you already admit, all of these same consequences are already true with reference type patterns:

Admittedly, this situation is similar to how type patterns and sub-classing work already. But I think the problem with integral primitive types is worse, because it is more implicit and easier for programmers to forget about.

This gets to the heart of the discomfort: primitives have always been special and “off to the side", and it is tempting to try to keep them in their own special little box.  But the special-ness of primitives is not so great, and its complexity ripples into other features (e.g., we can’t use primitives as generic type parameters, which in turn led to the explosion of IntToLongFunction functional interfaces types).  This is why we’re working hard in Valhalla to minimize these differences.

So, back to patterns.  Why are we doing the non-obvious thing with primitive type patterns?  It is about, in part, unifying the treatment of primitives and references to the extent we can — and we can do a lot here, because the existing rules tell us a lot already about which values can be safely cast to which others.  The language already defines complex-but-sensible rules about casting, not only between reference types, but between primitives (including between integral and floating point primitives) and between primitives and references.  Only some of these are about range checks.  We don’t want to make up new-and-different rules; we want to listen to what the language already says, and if you listen carefully, cast conversion tells you (almost) everything you need to know about `instanceof` with primitives, which in turn tells you everything you need to know about the semantics of type patterns.

The current semantics of instanceof — which is currently restricted to reference types only — can be viewed as a precondition for safe casting.  (Indeed, instanceof and cast are almost always seen together; this is not accidental.)  When restricted to reference types, this means subtyping and non-nullity.  (We could cast a List to a String, but we’d get an exception; not safe.  Similarly we could cast a null to any reference type and it would “succeed”, but the resulting reference would likely not be safe to use.)  When we lift the restriction on “reference types only”, the same rubric — safe casting — tells us exactly what instanceof should mean: can the value be represented exactly in the value set of the specified type.  While the existing primitives are not related by subtyping, the rules reflect the reality that 0 and 0L and 0.0D are “the same number”, because they can be freely converted from int to long to double and back without loss.  The proposed semantics of instanceof for primitives, whether or not we call this “range checking”, reflect this reality.  When we wrote the spec for primitive type patterns, it was almost entirely _taking away_ restrictions; the only thing we had to add was defining when a cast was “lossy” or not, and even that used language that was already present in Ch5 — “representable in the range of a type”.

But it is not about mere unification of primitives with references (as if that weren’t big enough already).

One thing that I see in common with all of these concerns / objections is that they look from the direction of the language we have towards the new feature, rather than looking from the perspective of the language _we want to get to_, and looking backward to the intermediate points.  Indeed, when you extrapolate from the language we have to “what does `instanceof int` mean", it seems that there are multiple possible interpretations, and it is tempting to pick the one that is most comforting.  But `instanceof int` is not the feature; it is merely one of the intermediate building blocks of the feature.  When viewed from the perspective of the big picture, new constraints emerge on what `instanceof int` might mean that are not obvious when you look at it only from the extrapolative perspective.

So if this is not about about mere unification of references and primitives, what it is about?  It is about _composition and reversibility_.  The language has many mechanisms for creating more complex things from simpler things (e.g., constructors), but fewer mechanisms for going in the opposite direction, and pattern matching aims to fill in these holes.  But the constructs we have for aggregation have certain behaviors with respect to composition and conversion; their duals need to have dual behaviors, otherwise we get sharp edges.

Here’s an example that doesn’t involve primitives:

    record R(Object o) { … }
    String s = …
    R r = new R(s)
    …
    If (r instanceof R(String t)) { … }

R can hold an Object.  Why are we allowed to instantiate it with a String, which is a different type?  Because JLS 5.3 (“Invocation Context”) says that in a method invocation context, a reference widening conversion (subtyping) may be performed on method parameters.  When we go to reverse this aggregation with a pattern match, the nested pattern `String s` is not total on the type of the component (Object).  This means a runtime check will be done, which will match if the value is not null and a narrowing reference conversion (casting) would succeed without error.  (This is the same as `instnaceof`, which is the safe casting precondition.). We didn’t pick these semantics at random; we picked them because we want for pattern matching to be able to take apart what the constructor invocation puts together, using the same conversions (and similar syntax.).

This leads us to an interpretation of the pattern match which is: “could this value have come from invoking the R constructor with a String.”  (This can’t be guaranteed by the language, since constructors can do weird things, but the intuition is really useful.).

We can make the exactly analogous situation with primitives:

    record S(long x) { … }
    int y = …
    S s = new R(y)
    …
    If (s instanceof S(long z)) { … }

Why are we allowed to invoke the constructor of S, which takes long, with an int?  Because JLS 5.3 also allows _primitive widening conversions_ here.  But, if we did as you suggest (as others have):

* Only allow simple variable bindings in type patterns for integral types. Don't do range checking in patterns at all. (To me this seems like the best approach.)

then we can reverse the construction of R, and ask “did this come from invoking R with a String”, but we can’t reverse the construction of S and ask “did this come from invoking S with an int.”  That’s terrible — it perpetuates the sharp edge between primitives and references into places that users will constantly be cut on.

Here’s another thing that fails with the narrow interpretation:

    record G<T>(T t) { }
    G<Integer> g = …
    If (g instanceof G(int x)) { … }

If `int` were only applicable to `int`, we couldn’t nest the `int x` pattern here; we’d have to nest `Integer x`, and convert explicitly (risking NPE.). We want to be able to compose these tests and conversions.  (Some have suggested a variant of your rule, which is to allow unboxing conversions but not primitive widening conversions.  But this is the worst kind of ad-hoc, preference-based tinkering, because it creates a new context with a new and different set of conversions from any other context.  That’s pure incremental complexity.)

Similarly, because aggregation already composes nicely, we want destructuring to compose in the same way.  We can pack an R in an Optional:

    Optional<R> o = Optional.of(new R(“foo”));

merely by composing method invocations.  We want destructuring to compose in the same way:

    If (o instanceof Optional.of(R(String s)) { … }

so that our intuition — could this have come from the corresponding aggregation expression — is preserved.  This imposes strong constraints constraints about the meaning of pattern composition, as well as the treatment of conversions in pattern uses, so they can mean the dual of the corresponding aggregation expression.

Jumping up yet another level, this tendency to look at things from the extrapolating-forward perspective rather than the big-picture perspective illustrates one of the challenges of the rapid cadence (which otherwise has been a huge success.). In the old big-bang days, we would have released pattern matching in one big atomic transaction, with all the pieces in place on day 1, and it would have been more clear how they fit together.  Instead, we are releasing them in increments (which provides some value earlier, and is probably easier to learn), but the flip side is that it becomes less obvious where we are going.  (We are still careful to design far ahead enough of the releases to know it is going to fit together, as we did in the old days, but have been able to incrementally ship parts of it along the way.)  But it is harder to see the big picture by looking at the next increment.  But rest assured, there is a big picture here.

So the feature here is not “how should primitives work in instanceof” or “how should primitive type patterns work”; it is “how should patterns fit together to compose in a way that they compose cleanly and are the dual of the existing aggregation features of the language.”  While this may appear to make the language locally more complicated when you look at it as “we’re adding new rules about primitive type patterns", when you look at the big picture it actually makes it _simpler_ because there’s one set of rules that applies to everything, and fewer gratuitous sharp corners where things work differently.  Instanceof derives from safe casting; type patterns derive from instanceof; patterns compose the same way as the corresponding aggregation construct, with the duals of the corresponding conversions.  There’s no localized, ad-hoc tinkering with “how should primitive instanceof work” — it works the way it does because this is the semantics that give us composition and reversibility with a consistent set of rules, derived from rules we already have (e.g., cast conversion.).

Begin forwarded message:

From: Jens Lideström <jens at lidestrom.se<mailto:jens at lidestrom.se>>
Subject: On range checks in primitive patterns
Date: November 27, 2022 at 7:29:04 AM EST
To: amber-spec-comments at openjdk.java.net<mailto:amber-spec-comments at openjdk.java.net>

Dear Amber expert group,

I have followed the discussion on primitive patterns and wish to make a comment. The expert group is most certainty already aware of the issue, but I don't know if it has been discussed enough and I wish to draw attention to it.

It seems to me that there are attractive aspects of the current proposal in [1], in which a pattern match operation on integral types perform a range check on its operand, so that `i instanceof byte b` matches if `i` is in the range of a byte.

However, I think the following is a problematic aspect of this model: There is no syntactic difference between a pattern that perform only a variable binding, and a pattern that performs a range check.

For example, the following pattern perform a range check if the record component is declared to be `long` but not otherwise:

```
switch (o) {
    case SomeRecord(int i) -> System.out.println("Match");
    ...
}
```

This has the following consequences:

* When writing the pattern it's easy to make mistakes, forgetting to check the type of the record component, and out of habit declare the type in the pattern as `int`. If the record happens to contain a `long` this become a range check by mistake. This will often not be caught by the compiler and the pattern will simply not match in unexpected ways.

* When updating the type of the record component, for example from `int` to `long`, it is easy to forget that this might change patterns. Patterns that previously only performed a variable binding now makes a range check. This will often not be caught by the compiler.

Admittedly, this situation is similar to how type patterns and sub-classing work already. But I think the problem with integral primitive types is worse, because it is more implicit and easier for programmers to forget about.

This problem is also somewhat similar to how total patterns accepts null while non-total patterns rejects null.

I think these kinds of semantic differences between similar-looking constructs is something to be very wary about in programming languages!

Possible solutions:

* Only allow simple variable bindings in type patterns for integral types. Don't do range checking in patterns at all. (To me this seems like the best approach.)

* Find an alternative syntax for patterns that perform range checks. (This is related to a suggestion from Tagir Valeev to this list on 2022-11-16 17:50.)

Thank you, expert group, for your good work! It it truly a treat for a language enthusiast to be able to follow your discussions on these lists.

Best regards,
Jens Lideström

[1]: https://bugs.openjdk.org/browse/JDK-8288476

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20221128/8d8db013/attachment-0001.htm>