Two new draft pattern matching JEPs

Tue Mar 9 18:17:55 UTC 2021

----- Mail original -----
> De: "Guy Steele" <guy.steele at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Gavin Bierman" <gavin.bierman at oracle.com>, "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Envoyé: Vendredi 5 Mars 2021 23:41:30
> Objet: Re: Two new draft pattern matching JEPs

>> On Mar 3, 2021, at 7:50 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>> 
>> ----- Mail original -----
>>> De: "Gavin Bierman" <gavin.bierman at oracle.com>
>>> À: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
>>> Envoyé: Jeudi 18 Février 2021 13:33:20
>>> Objet: Two new draft pattern matching JEPs
>> 
>>> Dear all,
>>> 
>> 
>> [...]
>> 
>>> 
>>> - Pattern Matching for switch: https://bugs.openjdk.java.net/browse/JDK-8213076
>>> 
>>> We split them up to try to keep the complexity down, but we might decide to
>>> merge them into a single JEP. Let me know what you think.
>> 
>> I think that we have got a little over our head with the idea of replacing the
>> switch guard by the guard pattern + conditional-and pattern.
>> 
>> The draft is poor in explanations on why we should do that apart because it's
>> more powerful, which is true but that not how to evaluate a feature.
>> Here, it doesn't seem we are trying to fix a broken feature or adapt an existing
>> feature to Java. It's just more powerful, but with a lot of drawbacks, see
>> below.
>> 
>> My main concern is when mixing the deconstructing pattern with the guard + and
>> pattern, those twos (two and a half) doesn't mix well.
>> 
>> For a starter, at high level, the idea is to mix patterns and expressions
>> (guards are boolean expressions), but at the same time, we have discussed
>> several times to not allow constants inside patterns to make a clear
>> distinction between patterns and expressions. We have a inconsistency here.
>> 
>> The traditional approach for guards cleanly separate the pattern part from the
>> expression part
>>  case Rectangle(Point x, Point y) if x > 0 && y > 0
>> which makes far more sense IMO.
>> 
>> The current proposal allows
>>  case Rectangle(Point x & true(x > 0), Point y & true(y > 0))
>> which is IMO far least readable because the clean separation between the
>> patterns and the expressions is missing.
> 
> As I have already indicated in another email, if Rectangle had six or eight
> components rather than just two, for some purposes it might be more readable to
> have the constraint for each component listed next to its binding, rather than
> making the reader compare a long list of bindings to a long list of
> constraints.
> 
> [more below]
> 
>> There is also a mismatch in term of evaluation, an expression is evaluated from
>> left to right, for a pattern, you have bindings and bindings are all populated
>> at the same time by a deconstructor, this may cause issue, by example, this is
>> legal in term of execution
>>  case Rectangle(Point x & true(x > 0 && y > 0), Point y)
>> because at the point where the pattern true(...) is evaluated, the Rectangle has
>> already been destructured, obviously, we can ban this kind of patterns to try
>> to conserve the left to right evaluation but the it will still leak in a
>> debugger, you have access to the value of 'y' before the expression inside
>> true() is called.
> 
> I would like to question your assertion
> 
>	bindings are all populated at the same time by a deconstructor
> 
> Is this really necessarily true?  I would have thought that the job of the
> deconstructor is to provide the values for the bindings, and that int principle
> the values are then kept in anonymous variables or locations while the
> subpatterns are processed, one by one, from left to right.  Because consider a
> more complex pattern:
> 
>	case Rectangle(Point(int x1, int y1), Point(int x2, int y2))
> 
> I would expect that the deconstructor for Rectangle does not fill in all four
> variables x1, y1, x2, y2 all at once; rather, it just supplies two values that
> are points, and then the first point value is matched against pattern Point(int
> x1, int y1), and only then is the second point value matched against pattern
> Point(int x2, int y2)).

yes, that why i said "by a deconstructor", maybe i should have write, for ONE deconstructor.
The deconstructor of Rectangle provides two bindings which both used as target parameter of the deconstructor of Point (which is called twice).

> 
> Now this example is not exactly analogous to your original, because we have not
> provided explicit variables for this purpose.  I believe that in an earlier
> version of the design one would write
> 
>	case Rectangle(Point(int x1, int y1), Point(int x2, int y2))
> 
> But perhaps in the current proposal one must write
> 
>	case Rectangle(Point(int x1, int y1) & var p1, Point(int x2, int y2) & var p2)
> 
> or perhaps
> 
>	case Rectangle(var p1 & Point(int x1, int y1), var p2 & Point(int x2, int y2))
> 
> In all of these cases, my argument is still the same: the simplest model is that
> that deconstructor for Rectangle just supplies two values that are points, and
> then the first point value is matched against the first sub pattern, and only
> then is the second point value matched against the second subpattern.  As a
> result p2 and x2 and y2 do not yet have bindings or values while the first
> sub-pattern is being matched.

I think we agree here, i was just saying that for a deconstructor call you get all the bindings at the same time,
so in the case there is two bindings, having two expressions that each one guard one binding is equivalent to have one guard that uses the two bindings.

Obviously, when patterns are nested, you don't have access to all the bindings of all the patterns at once.

Now, i don't think you have to use an & between patterns to provide a name, in my opinion, we should things in the other way
  case Rectangle(Point(int x1, int y1), Point(int x2, int y2))
should be a simplified way to write
  case Rectangle(Point(int x1, int y1) _, Point(int x2, int y2) _)
i.e. when destructuring, we don't provide a name for that binding (hence my use of '_')

With that in mind, if you want to name the intermediary point, you can just write
  case Rectangle(Point(int x1, int y1) p1, Point(int x2, int y2) p2)

> 
> A compiler would likely optimize common special cases to effectively implement
> all-at-once population of bindings when it would be impossible to detect any
> difference in behavior.  But I don’t think all-at-once population is the right
> theoretical model.
> 
> [more below]
> 
>> In term of syntax, currently the parenthesis '(' and ')' are used to
>> define/destructure the inside, either by a deconstructor or by a named pattern,
>> but in both cases the idea is that it's describe the inside. Here, true() and
>> false() doesn't follow that idea, there are escape mode to switch from the
>> pattern world into the expression world.
>> At least we can use different characters for that.
>> 
>> Also in term of syntax again, introducing '&' in between patterns overloads the
>> operator '&' with one another meaning, my students already have troubles to
>> make the distinction between & and && in expressions.
>> As i already said earlier, and this is also said in the Python design document,
>> we don't really need an explicit 'and' operator in between patterns because
>> there is already an implicit 'and' between the sub-patterns of a deconstructing
>> pattern.
> 
> As I have already indicated in another email, I agree with you here; I very much
> share your concerns about the overloading of a single symbol `&` (or whatever
> spelling we give it) to mean two or three different things within patterns, not
> to mention its existing uses in expression contexts.
> 

Rémi