Towards member patterns

Fri Jan 26 17:07:51 UTC 2024

> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Sent: Friday, January 26, 2024 1:31:54 PM
> Subject: Re: Towards member patterns

>> I think your proposal solves the cases where the type you are switching on is
>> closed (final, sealed) but not if the type is open (non-sealed).

> A bold claim! Let's see how this stacks up.

>> Let's take an example, let suppose I've the following hierarchy

>>   public sealed interface Tree {

> ... snip ... sealed class, private implementation classes, public static
> factories, public static patterns ... check.

>> If I want to have a static method children that returns all the children of the
>> Tree, using the pattern matching I would like to write

>>   static List<Tree> children(Tree tree) {
>>     return switch(tree) {
>>       case Tree.none() -> List.of();
>>       case Tree.cons(Tree child) -> List.of(child);
>>     };
>>   }

> Full disclosure: we're not totally there yet. This switch isn't (yet)
> exhaustive; we need a way to mark none+cons as being an exhaustive set.
Agree. 

> That's on the list, but was looking to sync on the broad strokes first.

>> As I said, it works great with a closed hierarchy, but now let suppose the
>> hierarchy is not sealed, if the hierarchy is not sealed, having static
>> factories make less sense because we do not know all the subtypes.

> I don't see this. (As one example, consider List: it is open, yet there are
> static factories like List.of(...)). We had static factories long before we had
> sealed hierarchies.
Yes, here it's one static factory per subtype, which make little sense since the number of subtypes is unknown. 

> But let's keep going.

>> So we have

>>   public interface Tree {}
>>   public enum None implemnts Tree { NONE }
>>   public class Cons implements Tree {
>>     private final Tree tree;

>>     public Cons(Tree tree) { this.tree = tree; }
>>   }

>> and in the future, someone may add
>>   public class Node {
>>     private final Tree, left, right;

>>     public Node(Tree left, Tree right) { this.left = left; this.right = right; }
>>   }

>> Because the hierarchy is open, we need to use the late binding here.
>> So i may rewrite children like this
>>   static List<Tree> children(Tree tree) {
>>     return switch(tree) {
>>      case that.extract(List<Tree> list) -> list;   // wrong syntax, it's just to
>>       convey the semantics
>>     };
>>   }

> I'm not sure what this example is supposed to say, since `that` is only defined
> inside the body of a pattern method. Are you trying to do child-extraction as a
> pattern, rather than as an accessor? (This is a modeling question.) I'm not
> sure this is a great modeling for a Tree, but let's look past that. If so, Tree
> needs an _abstract pattern_ that binds a List<Tree>. That's easy:

> interface Tree<T> {
> public __inverse Tree withChildren(List<T> children);
> }

> and the subclasses can each override it:

> class Empty<T> implements Tree<T> {
> public __inverse Tree withChildren(List<T> children) {
> yield Collections.emptyList();
> }
> }
> ...

> and the client can take an arbitrary Tree and match it:

> case Tree.withChildren(var children) -> ...

> So I don't see that this doesn't work, but I think I see where you got confused.

>> Here, we we want to call an abstract pattern method that will be implemented
>> differently for each subclasses, but your proposal does not allow that (sorry
>> for the pun).

> Yes, it does. (This conversation would be easier if you could frame this as a
> question ("Can I ...") rather than an statement ("It is not possible...") which
> turns out to be incorrect.)

>> Inside a pattern, there are two implicit values, we have 'this' as usual and we
>> have 'that' (we call it that way) that represent the value actually matched.

> Correct. Let's talk about the role of these two context variables.

> Every pattern has a match candidate. This is the thing on the RHS of the
> instanceof, or the selector in the switch. It is the thing about which we ask
> "does the thing match the pattern."

> Every pattern has a _primary type_. It is the minimal type for which the match
> candidate could possibly match the pattern. For a record pattern like
> `Point(int x, int y)`, the primary type is Point. (A pattern is rejected at
> compile time as inapplicable if the type of the match candidate is not
> cast-convertible to the primary type of the pattern.)

> In the body of a pattern method, the match candidate is denoted with the context
> variable `that`, whose type is the primary type of the pattern. The compiler
> may have to make up some of the difference between the type of the match
> candidate and the primary type:

> Object o = ...
> switch (o) {
> case Foo(int x) -> ...
> }

> Here, the primary type of the Foo pattern is Foo, so to test if the case
> matches, the compiler inserts an `instanceof Foo`, and if that succeeds, casts
> `o` to `Foo`, and invokes the Foo pattern with that.

> Not every pattern has a receiver, just like not every method has a receiver.
> Constructors and instance methods have receivers; same with their pattern
> counterparts. For deconstructors, both the receiver and the match candidate are
> the same object. This is not true for all instance patterns.
yes, 

> A receiver plays two roles in a pattern match, just as it does in a method
> invocation:

> - Finding the code to invoke by searching the class hiearchy
> - Associating the implementing code with the state of the object, in case the
> implementation of the pattern needs some state from the object that declares it

> Let's go through two examples to see the cases.

> AN easy example is regular expressions. We have a class j.u.regex.Pattern, which
> represents a compiled regex. A regular expression match is a form of pattern
> match (there's a match candidate, it is conditional, if it succeeds we extract
> the capture groups.) Surely we should expose a "match" pattern on Pattern.

> class Pattern {
> public __inverse String regexMatch(String... groups) {
> Matcher m = matcher(that);
> if (m.matches())
> __yield IntStream.range(1, m.groupCount())
> .map(Matcher::group)
> .toArray(String[]::new); }
> }

> We match it with an explicit receiver:

> final Pattern As = Pattern.compile("([aA]*)");
> ...
> if (aString instanceof As.regexMatch(String as)) { ... }

> The body uses both `this` and `that`. When it goes to do the actual matching, it
> takes the match candidate, `that`, and passes it to `matcher()`; we are
> matching against the match candidate, not the receiver. But it also uses the
> receiver in the same line of code, quietly; the locution `matcher(that)` is
> really `this.matcher(that)`. It is using the state of _this regex_ to determine
> the match logic. The pattern needs both, and they are different objects.
yes, 

> In our `instanceof` test, there are two "parameters", though neither of them
> looks like one: the match candidate (on the LHS of the instanceof) and the
> receiver. These are packaged up as `that` and `this` for the pattern
> invocation.

> The other example is a conditional behavior on an object, such as "does this
> List have any elements, and if so, give me one." We put an abstract pattern on
> List:

> interface List<T> {
> public __inverse List<T> withElement(T element);
> }

> (It could also be a default pattern; works the same as default methods.) The
> implementation in emptyList always fails. The implementation in ArrayList might
> look like:

> public __inverse List<T> withElement(T element) {
> if (that.size > 0)
> _yield that.elements[0];
> }

> Now, implementing this guy gets tricky, since we have two context variables
> which are both of the same type, ArrayList<T>. (Maybe we have to explicitly use
> a covariant "override" here; TBD.)
I do not think allowing covariant override is sound. Because if the method is unbound, yes, 'this' and 'that' are the same object at runtime, but if 'this' is bound, these are two differents objects so no covariant override should be allowed. 

> But as it turns out, the two will usually be the same object:
see above 

> switch (aCollection) {
> case List.withElement(var t): ...
> }

> How does this match work? Well, the primary type of List.withElement is List<T>,
> so the compiler tests `aCollection instanceof List`, and if so, casts the match
> candidate to List. Since there is no explicit receiver, it uses the match
> candidate as the receiver also (this is like an unbound method reference), and
> does the virtual method search, and finds ArrayList::withElement, and invokes
> it. Different types of collections will use different implementations of the
> pattern.

>> Now, to finish the example, using '::' instead of '.', children in the first
>> example should be written like this

> Remember you're not supposed to use words like "should" ;)
I think that using '::' instead of '.' is a great simplification, because it let the user to specify how the linkage is done, unbound or bound. 
Also I do not think it's a good idea to have a syntax which is context dependend, i.e Type.method() and Type.method() having different meaning/linkage semantics inside or outside a Pattern. 
Is there is another syntactical construct in Java that behave that way ? 

>> static List<Tree> children(Tree tree) {
>>     return switch(tree) {
>>       case Tree::extract(List<Tree> list) -> list;

> case Tree.extract, but yes.

>> I really think that not using 'that' as the receiver when calling an inverse
>> instance method is a missing opportunity because without that (again :) ),
>> there is no way to call an inverse abstract method, so no way to pattern match
>> on an open hierarchy.

> Hopefully I've cleared up part of the confusion; there are two ways to denote an
> instance pattern in a match: bound and unbound, and when it is unbound, it uses
> the match candidate as the receiver.

> So if your statement is "there should also be a way to ...", it is correct, but
> if your statement is "the receiver must be the match candidate", then that is
> catastrophically wrong, because then you can't do regex, type class witnesses,
> pattern objects, etc.
If the section "Pattern resolution" is rewritten in terms of bound and unbound methods, I agree. 

And as a request, i would like you to reconsider your position about not piggybacking the linkage of a pattern to the method reference semantics, which has the advantage of already existing and being explicit about what is bounded and what is not ? 

Rémi 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20240126/eadd33ec/attachment-0001.htm>