Guards

Brian Goetz brian.goetz at oracle.com
Fri Mar 5 23:11:54 UTC 2021


> A guard construct need not itself be a pattern.

True.  What is minimally needed is a _syntactic_ separation of what is 
pattern and what is expression, without having to wait for semantic 
analysis to understand what is being combined.  This is, in part, 
because there are still sequences that are matched by both the pattern 
and expression productions, notably `Identifier()` (could be a 
deconstruction pattern with no bindings, or could be a method invocation.)

> Rather, it can be viewed as a map from patterns to patterns.  Indeed, 
> they are formulated in exactly that way in Gavin’s BNF in JEP 
> JDK-8213076 "Pattern Matching for switch”: a guard is not a pattern, 
> but can only appear within a pattern as the right-hand operand of `&`:
>
> Pattern:
> PatternOperand
> Pattern & PatternOperandOrGuard
> PatternOperandOrGuard:
> PatternOperand
> GuardPattern

> As a result, a guard necessarily appears to the right of a `&` and 
> therefore necessarily to the right of a pattern.  We should also 
> inquire as to whether it is ever desirable in practice, within a chain 
> of `&` (pattern conditional-and) operations for a pattern to appear to 
> the right of a guard.

I have long had a nagging feeling that this will eventually be 
desirable.  Let's say we have P & g & Q & h; under what conditions can 
we commute g and Q without regret?  I can think of four potential 
sources of regret:

  - g declares bindings that are inputs to Q
  - the cost model of Q is such that we'd like to run g first, and 
short-circuit
  - Q might throw an exception when g does not hold
  - Q might have side-effects that we don't want to run if g does not hold

I think we can eliminate the last one; I'm pretty comfortable saying 
that if you write side effects in pattern declarations, you get what you 
deserve.  And, the linguistic part of patterns are not supposed to throw 
exceptions, but badly written pattern declarations may anyway.  But that 
still leaves the dataflow and performance concerns; I think I will 
eventually want to be able to specify the order, and get 
short-circuiting.  This is why I've resisted this direction to date.

> If not, then `&` chains always have the simple form
>
> pattern & pattern & … & pattern & guard & guard & … & guard
>
> where the number of patterns must be positive but the number of guards 
> may be zero.  And if this is the case, it is not unreasonable to ask 
> whether readability might not be better served by better marking that 
> transition from patterns to guard in the chain, for example:
>
> pattern & pattern & … & pattern when guard & guard & … & guard
>
> And then we see that there really is no reason to try to overload `&` 
> (however it is actually spelled) to mean both pattern conjunction and 
> guard conjunction, because guard conjunction already exists in the 
> form of the `&&` expression operator:
>
> pattern & pattern & … & pattern when guard && guard && … && guard
>
> and therefore we can, after all, simplify this general form to the 
> case of zero or one guards:
>
> pattern & pattern & … & pattern [when guard]

There's one more turn of this crank: if we are willing to move the 
guards all to the right (big if), then, why say `when`, and not `&&`?  
Then it looks just like the `if...instanceof` situation.

     case pattern & pattern && guard:

This further align patterns in instanceof with patterns in switch. (With 
one potentially surprising caveat: we can never switch on booleans; 
`case true && false` would not match `true`. Pause for groans.)

> Finally, given that (an earlier version of) the patterns design 
> already encompasses forms that can bind the entire object as well as 
> components (what is done in other languages with `as`,  I have to ask: 
> what are the envisioned practical applications of pattern conjunction 
> other than as a cute way to include guards or a (more verbose) way to 
> bind the entire value as well as components?  Maybe as a way to fake 
> intersection types?

When a pattern focuses on a part, rather than the whole.  A pattern like 
`Point(var x, var y)` matches / destructures the whole thing, but other 
patterns can act as queries.  Imagine we have a pattern 
`Map.with(key)(var value)`, which matches maps that have the specified 
key.  We would likely want to combine these with &:

     if (x instanceof (Map.with(key1)(var val1) & Map.with(key2)(var 
val2))) { ... }

This scales up to query APIs such as a JSON parsing API, where you only 
want to match a blob of JSON if it has all the parts you are looking for 
(similar to "spec/conform" in Clojure.)  From 
https://github.com/openjdk/amber-docs/blob/master/site/design-notes/pattern-match-object-model.md: 


switch (doc) {
     case stringKey("firstName")(var first)
          & stringKey("lastName")(var last)
          & intKey("age")(var age)
          & objectKey("address")(
                  stringKey("city")(var city)
                  & stringKey("state")(var state)
                  & ...): ...
}

This expresses not only the all-or-nothing nature of the composite 
query, but permits the pattern to match the structure of what is being 
queried (the `objectKey` pattern has a nested pattern which applies to 
the body of that object, which itself is an & pattern.

This is the sort of example I could imagining wanting to stick a guard 
in the middle of; I could well want to guard "name not empty" and not 
bother parsing the rest of the document.  Semantically that might be 
equivalent to putting the guard at the end, but the user might not thank 
us for not letting them short-circuit out.

The real unknown is:
     - g declares bindings that are inputs to Q

Can we construct a credible example?

As to intersection types, pattern-& is not even the right vehicle, 
because in

     case Foo f & Bar b:

then f and b will have types Foo and Bar, but really, I want something 
of type (Foo&Bar).  If this were important, I'd probably want to be able 
to use an intersection type in the type pattern:

     case (Foo&Bar) fb:

> Now, all of this has no bearing on whether or not guards are required 
> to be “top level only” in all cases; it argues only that guards need 
> not appear within pattern-conjunction chains.  But I believe it would 
> be perfectly reasonable to write
>
> case Point(int x when x > 0, int y when y > x):
>
> Rémi has argued that this would be better written
>
> case Point(int x, int y) when x > 0 && y > x:
>
> but I would argue that this choice is, and should be, a matter of 
> style, and when matching against a record with many fields it might be 
> more readable to mention each field’s constraint next to its binding 
> rather than to make the reader compare a list of bindings against a 
> list of constraints.

Agreed, this is a user choice.

> Bottom line: there are conceptually three distinct combining forms:
>
> pattern conjunction
> guard conjunction
> to a pattern, attach a guard
>
> and it may be a mistake after all to conflate them by trying to use 
> the same syntax or symbol for all three.
>
> So what I would like to see is the convincing application example 
> where you really do want to write
>
> pattern & guard & pattern

Did the "guard in the middle of the JSON blob" do that?

> because then everything I’ve written above falls to the ground.

Well, even if so, its possible it can be propped up, just not using one 
universal support.  Suppose we allow guards to be conjoined on the end 
of a pattern with &&.  Then you could say

     case Foo(var x) && x > 0:

as well, even, as

     case Foo(var x && x > 0):

But if you wanted to do the P & g & Q thing, you'd need a grobble-style 
pattern to do so:

     case P & grobble(g) & Q:

Recall that we will eventually be able to write `grobble()` as an 
ordinary declared pattern, and we can also still resurrect the 
true/false built-in patterns if we like, for rescuing this case.  (I 
still think we may eventually want a `non-null(e)` pattern, as it will 
likely be the most common form of grobbling:

     case Foo(var x && non-null(x)):

> Sorry to bikeshed here, but while “when” is nice, I think “if” is even 
> more appealing (short, familiar, already a keyword), especially if it 
> alone can express attachment of a guard to a pattern (and we can argue 
> about whether the parentheses are required):
>
> case Foo(var x) if (x > 0):

There's precedent for this, of course.  My concern with this is that the 
colon too easily hides the flow of what's going on:

     case Foo(var x) if (x > 0): if (x > 10) println(x);

Having `if` on both sides there seems likely to lead to both "which is 
which" confusion, as well as "why can't I have Perl-style `if` at the 
end of a statement."

>
>>     case grobble(e):
>>
>> which is later revealed to be sugar for:
>>
>>     case Foo(var _) & grobble(e):
>
> I think you meant it is sugar for
>
>     case var _ & grobble(e):

Yes.

> If so, then compare that to the claim that
>
>     case if (e):
>
> is sugar for
>
>     case var _ if (e):

Or:

     case &&e:


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20210305/d9986e55/attachment-0001.htm>


More information about the amber-spec-experts mailing list