Patterns: declaration

Fri Jan 22 17:39:12 UTC 2021

Wow, it's bikeshedding time!

Two points.

1. I worry about the asymmetry between pattern-with-args declaration
and invocation. See:

static pattern(int comp1, int comp2) foo(Type t, int param1, int param2) {...}

if(t instanceof foo(param1, param2)(int comp1, int comp2)) {...}

See, at the declaration, pattern parameters are declared after the
components while at invocation parameters are listed before the
components.

Also, having a target parameter as a part of the normal parameter list
could be confusing. An alternative idea is to list everything at
declaration in the same order as at invocation site. For example:

static pattern Type foo(int param1, int param2) -> (int comp1, int comp2) {...}
// it's not strictly required to give a name for the target parameter;
it could be referenced inside the pattern body via a specific keyword,
probably even 'this'!

class Optional<T> {
    static<T> Optional<T> of(T t) { ... }
    static<T> pattern Optional<T> of() -> (T t) { ... }
    static<T> Optional<T> empty() { ... }
    static<T> pattern Optional<T> empty() -> () { ... }
}

static pattern String regex(String regex) -> (String... groups) {...}

interface Map {
    pattern withMapping(K key) -> (V value);
}

class Class<T> {
    public pattern arrayClass() -> (Class<?> componentType) {...}
}

Another possibility is to use another type of brackets for pattern
parameters, let it be [], both at declaration and at invocation. In
this case, it will be more clear where we have pattern parameters and
output components. Also, we can omit them without any confusion if
there are none:

static pattern Type foo[int param1, int param2](int comp1, int comp2) {...}

class Optional<T> {
    static<T> Optional<T> of(T t) { ... }
    static<T> pattern Optional<T> of(T t) { ... }
    static<T> Optional<T> empty() { ... }
    static<T> pattern Optional<T> empty() { ... }
}
(now patterns and methods are truly symmetrical!)

static pattern String regex[String regex](String... groups) {...}

interface Map {
    pattern withMapping[K key](V value);
}

class Class<T> {
    public pattern arrayClass(Class<?> componentType) {...}
}

Other syntactic options are possible, but in general, it looks more
natural to me to specify output components on the right side of
pattern parameters.

2. effectively final expression

> We would likely want to enforce the requirement that expressions used as input
arguments be _effectively final_, to eliminate any confusion about the timing
of when they are evaluated.

I'm not sure I understand what is 'effectively final expression'. We
know what is 'effectively final variable'. Is it possible, say, to
call a method at the pattern invocation, like `case
regex(buildRegex())(String s1, String s2) -> ...`?

With best regards,
Tagir Valeev.

On Fri, Jan 22, 2021 at 11:08 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>
> Here's the first half of an *extremely rough* doc-in-progress that starts to show more of the syntax of the declaration of a pattern -- but does yet not dive into the _body_ of a pattern (which we should still leave for a later discussion).  This is still more about object model than directly about syntax; many people have been asking "I'm having a hard time imagining what this looks like in a class file", so this should help a bit.  It also fleshes out some of the requirements, specifically patterns delegating to other patterns.
>
>
>
> ## Beyond built-in patterns
>
> Related documents have explored a number of built-in pattern types, where the
> compiler knows enough about the pattern semantics to implement them directly,
> such as type patterns or record deconstruction patterns.  However, we also wish
> to support direct declaration of patterns as class members, such as
> deconstruction patterns for non-record classes, static patterns, and instance
> patterns.
>
> Given a pattern match like:
>
> ```
> if (p instanceof Point(int x, int y)) { ... }
> ```
>
> assuming the pattern matches, we need to capture somewhere how the bindings  `x`
> and `y` are extracted from the `Point`.  If `Point` is a record, we can do this
> without help from the user, but otherwise, someone is going to need to write
> some code somewhere to capture the applicability test and state extraction
> corresponding to this pattern.  (One can imagine various "tricks" for doing so
> (i.e., extracting fields, calling getters, etc), but these tricks quickly run
> out of gas for classes whose state is not trivially and transparently coupled to
> its API.)  We'll call the various forms of this _member patterns_, which are
> expressed as a new kind of class member.
>
> > Member patterns are executable class members, just like constructors and
> > methods.
>
> Abstractly, a pattern operates on a single target parameter (which for instance
> and deconstruction patterns is just the receiver), and either succeeds or fails
> based on some condition.  If it succeeds, it produces zero or more output values
> (the destructured components, or _bindings_.)
>
> #### Pattern declarations in other languages
>
> We may be able to obtain some inspiration by looking at what other languages do.
> By the late 1990s, pattern matching over structural types (sequences and tuples)
> was a common feature of functional languages; languages that supported nominal
> product types (e.g., Haskell) also supported pattern matching over nominal
> product types.  In the mid-2000s, languages like [F#][fs-patterns] and
> [Scala][scala-patterns] extended the notion to additionally apply to abstract
> data types (those whose concrete representation was not necessary known to the
> client.) However, in both F# and Scala, patterns remain distinctly `static`
> entities; they are not a first-class part of the object hierarchy, and remain at
> the periphery of the object model.
>
> Scala takes the approach of modeling patterns as static methods with the name
> `unapply`.  An `unapply` method takes a single parameter (type checked against
> the matchee) and has some flexibility in its return type; it can return
> `boolean` if there are no bindings, or `Option[T]` (where `T` is often a tuple
> type) to carry the binding parameters.  A matcher for `Point(x,y)` would look
> like:
>
> ```
> object Point {
>     Option[Tuple[int, int]] unapply() { ... }
> }
> ```
>
> This is a pragmatic approach, with pros and cons.  It is nice that it is just an
> ordinary method, which can be called either directly or through a pattern match.
> On the other hand, there is lots of magic -- the language imputes special
> semantics based on method name, enclosing class, argument list, and return type.
> Also, because these methods are effectively static, there is no ability to
> overload, override, have abstract patterns, etc.
>
> C# takes a more principled, but also more limited, approach.  As of C# 9, a
> class can write one or more `Deconstruct` methods (magic name), using `out`
> parameters to receive the bindings (a feature already present in C#).  A
> `Deconstruct` method must be total (the deconstruction cannot fail), and there
> is no way to write instance or abstract patterns (though the equivalent of some
> static patterns can be achieved with extension methods.)
>
> Both of these approaches are adding patterns "around the edges"; we are
> proposing a pattern matching model that integrates much more deeply into the
> object model.
>
> #### Deconstruction patterns in Java
>
> The simplest form of pattern is a _deconstruction pattern_, which is the dual of
> a constructor.  And, just as constructors are an odd mix of instance and static
> behavior -- they are instance members (they have a receiver) but are not
> inherited and can only be accessed through their class name -- their duals have
> the exact same odd mix.  Another oddity of constructors is that they don't  have
> names -- the name supplied at the instantiate site is the name of a class,  not
> a class member.  It makes sense for deconstruction patterns to also share this
> characteristic.
>
> The target of a deconstruction pattern is obvious -- it is the receiver.
> Deconstruction patterns are _total_ on their target type -- as long as the
> target is valid as a receiver, the pattern always succeeds.  We'd like for the
> arity, types, order, and names of the binding variables to be manifest in the
> declaration code, as this is part of the classes API, and for the similarity
> with the corresponding constructor, if there is one, to be obvious both at the
> declaration and use site.  This might look like:
>
> ```{.java}
> class Point {
>     int x, y;
>
>     Point(int x, int y) {
>         // constructor body
>     }
>
>     deconstructor(int x, int y) {
>       // deconstructor body
>     }
> }
> ```
>
> There is a strong symmetry between the declarations of the constructor and
> deconstructor.  The constructor takes as arguments `x` and `y` and produces a
> `Point`; the deconstructor takes a `Point` and produces as bindings `x` and `y`,
> and the declaration of the argument list of the constructor and the binding list
> of the deconstructor are syntactically similar.  The locution
> `deconstructor(BINDINGS)` is meant to evoke the notion of "returning" the
> specified set of bindings.  For classes that model entities more or less
> transparently, as `Point` does, it is entirely reasonable to match constructors
> with deconstructors whose binding lists correspond to the constructor parameter
> list.
>
> There is also a symmetry between the creation of an object with a constructor,
> and the destructuring of an object with a deconstruction pattern.  A client
> matches against a deconstruction pattern as follows:
>
> ```
> case Point(var x, var y):
> ```
>
> Here, we interpret `Point` as a class name, and look in the `Point` class for a
> deconstruction pattern with compatible binding arity and types -- analogously to
> how we interpret the instance creation expression `new Point(x, y)`.
>
> We've left out the declaration of the body for now; we'll return to this after
> our tour of the member pattern model.  But it's important to note that, just as
> the constructor is free to perform defensive copying on the inputs, the
> deconstructor needs a way to do the same with the bindings.
>
> Deconstruction patterns are unusual in that they are always _total_; if the
> target is of the correct type, the deconstruction pattern should always complete
> successfully.  However, not all patterns are total; a pattern like
> `Optional.of(var x)` will only match a subset of instances of `Optional`.  As a
> simplifying approximation, we'll say that all other member pattern are partial;
> the bodies of these patterns will need a way to denote this as well.
>
> There's also no reason that deconstruction patterns cannot appear in interfaces,
> as long the interface supports enough API to implement it.  For example, we
> might want `Map.Entry` to support a deconstruction pattern that yields key and
> value:
>
> ```
> interface Entry<K,V> {
>     K getKey();
>     V getValue();
>
>     deconstructor(K key, V value) {
>         // invoke getKey() and getValue() and bind them appropriately
>     }
> }
> ```
>
> This would allow us to iterate through the elements of a `Map` more directly:
>
> ```
> for (Map.Entry(var k, var v) : map.entrySet()) {
>     // use k and v
> }
> ```
>
> #### Instance patterns
>
> The next simplest form of pattern declarations are _instance patterns_, which
> are duals of instance methods.  As with deconstruction patterns, the target of
> an instance pattern is the receiver, but unlike deconstruction patterns,
> instance patterns can be partial.  An instance pattern declaration looks like:
>
> ```
> class Class<T> {
>     public pattern(Class<?> componentType) arrayClass() {
>         ... body ...
>     }
> }
> ```
>
> A significant difference from deconstruction patterns is that instance patterns
> have names (just as instance methods do); later, we'll see that they can also
> have input arguments.
>
> At the use site, instance patterns are usually unqualified, referring to an
> instance pattern on the static type of the match target.  So if we have a
> `Class` in hand, we can match on those classes that represent arrays as:
>
> ```
> switch (clazz) {
>     case arrayClass(var componentType):
> }
> ```
>
> #### Static patterns
>
> Static patterns are slightly more complicated than instance patterns, because
> they lack a receiver -- and so the match target must be identified some other
> way.  We can do this by indicating the target type and name as a pattern
> parameter.  The duality between the "receiver" and the first parameter of a
> static method is not only well-known to the JVM), but also has precedent in the
> language: the method reference `Foo::bar` could refer to a static method of
> `Foo` or to an instance method (an "unbound method reference"), in which case
> the first argument becomes the receiver.
>
> ```
> class Optional<T> {
>     static<T> Optional<T> of(T t) { ... }
>
>     static<T> pattern(T t) of(Optional<T> target) { ... }
>
>     static<T> Optional<T> empty() { ... }
>
>     static<T> pattern() empty(Optional<T> target) { ... }
> }
> ```
>
> This leads to the familiar symmetry at the use site: we construct `Optional`
> instances with `Optional.empty()` and `Optional.of(t)`, and match against them
> like:
>
> ```
> switch (opt) {
>     case Optional.of(var t): ...
>     case Optional.empty(): ...
> }
> ```
>
> #### Patterns with input arguments
>
> Pattern matching fuse a test with zero or more conditional extractions, which is
> something we already do all the time.  Regular expression parsing is an example
> of this combination -- when we match against a regular expression, we get both a
> boolean result (did it match), and, if it did, we can then extract the "groups"
> bounded by `(...)` in the regular expression.  It would be nice if we could
> represent a regular expression as  an ordinary pattern match, rather than using
> an ad-hoc calling sequence, while still delegating to the same underlying
> regular expression library.  So how do we turn a regular expression into a
> pattern?
>
> We already have almost all the tools needed; a regular expression is like a
> pattern which binds a `String...`, whose target is a `String`.  We could, for
> each regular expression, write a static pattern:
>
> ```
> Pattern asbs = Pattern.compile("(a*)(b*)");
>
> static pattern(String...) asbsMatch(String target) { ...
>     ... use asbs to match target, and extract the groups ...
> }
> ```
>
> This works -- we could ask if a string matches the pattern `asbsMatch()`,  and
> bind the groups if so (further refining with nested patterns if we like.)  But
> it seems a little clunky to have to write this method for every regular
> expression; we'd like to write it once, and parameterize it with the regular
> expression we want.  This means we need to be able to specify an additional
> input argument beyond the match target.   This might be declared:
>
> ```
> static pattern(String... groups) regex(String target, String regex) {
>     ... use regex library to match against regex, extract bindings ...
> }
> ```
>
> Here, the first parameter denotes the target of the match (as before);  the
> remaining parameters must be passed at the use site.  This might be written:
>
> ```
> String AS_AND_BS = "(a*)(b*)";
> case regex(AS_AND_BS, String[] { var aString, var bString }):
> ```
>
> where the first "parameter" is the input parameter, and the second is a nested
> pattern that matches the sole binding variable.  This is easy enough for
> compilers to work out, but not so great for humans; it is not always obvious
> where the data ends and the patterns begin.   (This gets worse if there are
> syntactic constructs that could be either arguments or nested patterns, such as
> denoting constant patterns with literals.)   So it is probably preferable to
> syntactically reflect the two different sorts of arguments, such as:
>
> ```
> case regex(AS_AND_BS)(String[] { var aString, var bString }):
> ```
>
> If a pattern has two argument lists, the first is the input arguments, and the
> second receives the bindings.  If the first list is empty, as it usually is,
> it can be omitted.  Since our regex pattern is a varargs pattern, we could also
> elide the `String[]` pattern scaffolding:
>
> ```
> case regex(AS_AND_BS)(String aString, String bString):
> ```
>
> We would likely want to enforce the requirement that expressions used as input
> arguments be _effectively final_, to eliminate any confusion about the timing
> of when they are evaluated.
>
> #### Alternatives to Map::get
>
> The `Map::get` method takes a key and returns the corresponding value, if the
> mapping is present in the map, or `null` if the mapping is not present.  This is
> a problematic API choice for a number of reasons.  First, if the map supports
> null values, it is impossible to distinguish a mapping to null from
> mapping-not-present; if we try to distinguish between these cases with
> `Map::containsKey`,  we risk a possible race condition.  And, as we integrate
> inline classes and primitives into the generic type system as we are in
> Project Valhalla, for some types, `null` will not be available as a sentinel.
>
> If we had patterns in the language when we designed Collections, we might have
> reached for a pattern, which captures both the "is the mapping present", and
> "if so, what is the key mapped to" in one go:
>
> ```
> interface Map {
>     pattern(V value) withMapping(K key);
> }
> ```
>
> and clients would be required to confront this conditionality directly:
>
> ```
> if (m instanceof Map.withMapping(k)(var v)) { ... use v ... }
> ```
>
> ## Composition
>
> Just as constructors compose (a subclass constructor can delegate to a
> superclass constructor, and constructors/factories can delegate to
> constructors/factories for their components), patterns should compose as well.
> Let's gather some requirements for composition of pattern declarations.
>
> ```
> class A {
>     private int a;
>
>     public A(int a) { this.a = a; }
>     public deconstructor(int a) { ... }
> }
>
> class B extends A {
>     private int b;
>
>     public B(int a, int b) { super(a); this.b = b; }
>
>     public pattern(int a, int b) {
>         // want to delegate to superclass deconstructor
>     }
> }
> ```
>
> The class `A` was designed for subclassing, and subclass constructors will
> delegate to constructors in `A`.  Since deconstruction is the dual of
> construction, it stands to reason that the deconstruction pattern in `B` can
> delegate to the deconstruction pattern in `A`, and then add whatever B-specific
> deconstruction logic is needed.
>
> If we use composition instead of inheritance, we have a similar requirement.  In
> this next example, class `D` has two members of type `C`, and `D`'s constructor
> delegates to the  `B` constructor; we would want its deconstructor to be able
> to do the same thing.
>
> ```
> class C {
>     private int c;
>
>     public C(int c) { this.c = c; }
>     public deconstructor(int c) {  ... }
> }
>
> class D {
>     C c1;
>     C c2;
>
>     public D(int x, y) {
>         this.c1 = new C(x);
>         this.c2 = new C(y);
>     }
>
>     public deconstructor(int x, int y) {
>         // delegate to C pattern to extract x from c1 and y from c2
>     }
> }
> ```
>
> In these examples, we've composed deconstruction (total) patterns into bigger
> patterns, so we didn't have to worry about what happens when the inner pattern
> fails.  But we also want to be able to compose partial patterns into partial
> patterns.  For example, suppose we have:
>
> ```
> interface Shape {
>
>     // if component is visible, provide bounds
>     public pattern(Rect bound) visibleShape();
> }
> ```
>
> And suppose we have a class that represents the composition of two shapes; we
> would like it to implement the `visibleShape()` pattern by delegating to the
> `visibleShape()` pattern for each of the shapes, fail if either of these fail,
> and if both succeed, return the bounding rectangle for both shapes:
>
> ```
> class CompoundShape implements Shape {
>     Shape s1 = ...
>     Shape s2 = ...
>
>     public pattern(Rect r) visibleShape() {
>         ...
>         if (s1 instanceof visibleShape(var r1)
>             && s2 instanceof visibleShape(var r2)) {
>                 // succeed with boundingRect(r1, r2)
>         }
>         else {
>             // fail
>         }
>     }
> }
> ```
>