Member Patterns -- the bikeshed

Victor Nazarov asviraspossible at gmail.com
Sat Mar 30 19:23:11 UTC 2024


I have two points that I think may be good to consider in the list of
options.

1. I'm not sure if this was considered, but I find explicit lists of
covering patterns
rather natural and more flexible than using case as a pattern-modifier.

Explicit lists may look like:

````
    // Matches declaration "matches (of|empty)" states that
    // "of" and "empty" covers full set of Optional<T> values
    class Optional<T> matches of|empty {
    }
````

The important feature of explicit lists is that there may be more than one
covering set of patterns.

````
    // There can be multiple sets of patterns, were each set covers all
possibilities
    class List matches headAndTail|empty, initAndLast|empty {
        // ...
    }
    class Glass matches empty|nonEmpty, full|nonFull {
        // ...
    }
````

2. I think that there is a middle ground between functional and imperative
pattern body definition style that may look cumbersome at first, but
nevertheless gives you best of both worlds:

    * deconstructor patterns look dual to constructors
    * names from the list of pattern variables are actually used and
checked by the compiler
    * control flow is still functional, which is more natural

The downside that is retained from the imperative style is the need for
alpha-renaming,
but I think we still have to deal with shadowing and renaming
local-variable seems natural and easy.

Middle ground may be used like a special form that can be used in the
pattern body.
This form works mostly the same way as `with`-clause as defined in the
"Derived Record Instances" JEP.

Here is the long list of examples to fully illustrate different
interactions:

````
    class Optional<T> matches (of|empty) {
        public static <T> pattern<Optional<T>> of(T value) {
            if (that.isPresent()) {
                match {
                    value = that.get();
                }
            }
        }

        public static <T> pattern<Optional<T>> empty() {
            if (that.isEmpty())
                match {}
        }
    }

    class Pattern {
        public pattern<String> regexMatch(String... groups) {
            Matcher m = this.matcher(that);
            if (m.matches()) {
                match {
                    groups =
                            IntStream.range(1, m.groupCount())
                                    .map(Matcher::group)
                                    .toArray(String[]::new);
                }
            }
        }
    }

    class A {
        private final int a;

        public A(int a) {
            this.a = a;
        }
        public pattern A(int a) {
            match {
                a = that.a;
            }
        }
    }

    class B extends A {
        private final int b;

        public B(int a, int b) {
            super(a);
            this.b = b;
        }

        public pattern B(int a, int b) {
            if (that instanceof super(var aa)) {
                match {
                    a = aa;
                    b = that.b;
                }
            }
        }
    }

    interface Converter<T,U> {
        pattern<T> convert(U u);
    }
    Converter<Integer, Short> c =
        pattern (s) -> {
            if (that >= Short.MIN_VALUE && that <= Short.MAX_VALUE)
                match {
                    s = (short) that;
                }
        };
````

--
Victor Nazarov


On Fri, Mar 29, 2024 at 10:59 PM Brian Goetz <brian.goetz at oracle.com> wrote:

> We now come to the long-awaited bikeshed discussion on what member
> patterns should look like.
>
> Bikeshed disclaimer for EG:
>   - This is likely to evoke strong opinions, so please take pains to be
> especially constructive
>   - Long reply-to-reply threads should be avoided even more than usual
>   - Holistic, considered replies preferred
>   - Please change subject line if commenting on a sub-topic or tangential
>     concern
>
> Special reminders for Remi:
>  - Use of words like "should", "must", "shouldn't", "mistake", "wrong",
> "broken"
>    are strictly forbidden.
>  - If in doubt, ask questions first.
>
> Notes for external observers:
>  - This is a working document for the EG; the discussion may continue for a
>    while before there is an official proposal.  Please be patient.
>
>
> # Pattern declaration: the bikeshed
>
> We've largely identified the model for what kinds of patterns we need to
> express, but there are still several degrees of freedom in the syntax.
>
> As the model has simplified during the design process, the space of syntax
> choices has been pruned back, which is a good thing.  However, there are
> still
> quite a few smaller decisions to be made.  Not all of the considerations
> are
> orthogonal, so while they are presented individually, this is not a "pick
> one
> from each column" menu.
>
> Some of these simplifications include:
>
>  - Patterns with "input arguments" have been removed; another way to get
> to what
>    this gave us may come back in another form.
>  - I have grown increasingly skeptical of the value of the imperative
> `match`
>    statement.  With better totality analysis, I think it can be eliminated.
>
> We can discuss these separately but I would like to sync first on the broad
> strokes for how patterns are expressed.
>
> ## Object model requirements
>
> As outlined in "Towards Member Patterns", the basic model is that patterns
> are
> the dual of other executable members (constructors, static methods,
> instance
> methods.)  While they are like methods in that they have inputs, outputs,
> names,
> and an imperative body, they have additional degrees of freedom that
> constructors and methods lack:
>
>  - Patterns are, in general, _conditional_ (they can succeed or fail), and
> only
>    produce bindings (outputs) when they succeed.  This conditionality is
>    understood by the language's flow analysis, and is used for computing
> scoping
>    and definite assignment.
>  - Methods can return at most one value; when a pattern completes
> successfully,
>    it may bind multiple values.
>  - All patterns have a _match candidate_, which is a distinguished,
>    possibly-implicit parameter.  Some patterns also have a receiver, which
> is
>    also a distinguished, possibly-implicit parameter.  In some such cases
> the
>    receiver and match candidate are aliased, but in others these may refer
> to
>    different objects.
>
> So a pattern is a named executable member that takes a _match candidate_
> as a
> possibly-implicit parameter, maybe takes a receiver as an implicit
> parameter,
> and has zero or more conditional _bindings_.  Its body can perform
> imperative
> computation, and can terminate either with match failure or success.  In
> the
> success case, it must provide a value for each binding.
>
> Deconstruction patterns are special in many of the same ways constructors
> are:
> they are constrained in their name, inheritance, and probably their
> conditionality (they should probably always succeed).  Just as the syntax
> for
> constructors differs slightly from that of instance methods, the syntax for
> deconstructors may differ slightly from that of instance patterns.  Static
> patterns, like static methods, have no receiver and do not have access to
> the
> type parameters of the enclosing class.
>
> Like constructors and methods, patterns can be overloaded, but in
> accordance
> with their duality to constructors and methods, the overloading happens on
> the
> _bindings_, not the inputs.
>
> ## Use-site syntax
>
> There are several kinds of type-driven patterns built into the language:
> type
> patterns and record patterns.  A type pattern in a `switch` looks like:
>
>     case String s: ...
>
> And a record pattern looks like:
>
>     case MyRecord(P1, P2, ...): ...
>
> where `P1..Pn` are nested patterns that are recursively matched to the
> components of the record.  This use-site syntax for record patterns was
> chosen
> for its similarity to the construction syntax, to highlight that a record
> pattern is the dual of record construction.
>
> **Deconstruction patterns.**  The simplest kind of member pattern, a
> deconstruction pattern, will have the same use-site syntax as a record
> pattern;
> record patterns can be thought of as a deconstruction pattern "acquired for
> free" by records, just as records do with constructors, accessors, object
> methods, etc.  So the use of a deconstruction pattern for `Point` looks
> like:
>
>     case Point(var x, var y): ...
>
> whether `Point` is a record or an ordinary class equipped with a suitable
> deconstruction pattern.
>
> **Static patterns.**  Continuing with the idea that the destructuring
> syntax
> should evoke the aggregation syntax, there is an obvious candidate for the
> use-site syntax for static patterns:
>
>     case Optional.of(var e): ...
>     case Optional.empty(): ...
>
> **Instance patterns.**  Uses of instance patterns will likely come in two
> forms,
> analogous to bound and unbound instance method references, depending on
> whether
> the receiver and the match candidate are the same object.  In the unbound
> form,
> used when the receiver is the same object as the match candidate, the
> pattern
> name is qualified by a _type_:
>
> ```
> Class<?> k = ...
> switch (k) {
>     // Qualified by type
>     case Class.arrayClass(var componentType): ...
> }
> ```
>
> This means that we _resolve_ the pattern `arrayClass` starting at `Class`
> and
> _select_ the pattern using the receiver, `k`.  We may also be able to omit
> the
> class qualifier if the static type of the match candidate is sufficient to
> resolve the desired pattern.
>
> In the bound form, used when the receiver is distinct from the match
> candidate,
> the pattern name is qualified with an explicit _receiver expression_.  As
> an
> example, consider an interface that captures primitive widening and
> narrowing
> conversions, such as those between `int` and `long`.  In the widening
> direction,
> conversion is unconditional, so this can be modeled as a method from `int`
> to
> `long`.  In the other direction, conversion is conditional, so this is
> better
> modeled as a _pattern_ whose match candidate is `long` and which binds an
> `int`
> on success.  Since these are instance methods of some class (say,
> `NumericConversion<T,U>`), we need to provide the receiver instance in
> order to
> resolve the pattern:
>
> ```
> NumericConversion<int, long> nc = ...
>
> switch (aLong) {
>     case nc.narrowed(int i):
>     ...
> }
> ```
>
> The explicit receiver syntax would also be used if we exposed regular
> expression
> matching as a pattern on the `j.u.r.Pattern` object (the name collision on
> `Pattern` is unfortunate).  Imagine we added a `matching` instance pattern
> to
> `j.u.r.Pattern`; then we could use it in `instanceof` as follows:
>
> ```
> static final java.util.regex.Pattern P = Pattern.compile("(a*)(b*)");
> ...
> if (aString instanceof P.matching(String as, String bs)) { ... }
> ```
>
> Each of these use-site syntaxes is modeled after the use-site syntax for a
> method invocation or method reference.
>
> ## Declaration-site syntax
>
> To avoid being biased by the simpler cases, we're going to work all the
> cases
> concurrently rather than starting with the simpler cases and working up.
> (It
> might seem sensible to start with deconstructors, since they are the "easy"
> case, but if we did that, we would likely be biased by their simplicity
> and then
> find ourselves painted into a corner.)  As our example gallery, we will
> consider:
>
>  - Deconstruction pattern for `Point`;
>  - Static patterns for `Optional::of` and `Optional::empty`;
>  - Static pattern for "power of two" (illustrating a computations where
> success
>    or failure, and computation of bindings, cannot easily be separated);
>  - Instance pattern for `Class::arrayClass` (used unbound);
>  - Instance pattern for `Pattern::matching` on regular expressions (used
> bound).
>
> Member patterns, like methods, have _names_.  (We can think of
> constructors as
> being named for their enclosing classes, and the same for
> deconstructors.)  All
> member patterns have a (possibly empty) ordered list of _bindings_, which
> are
> the dual of constructor or method parameters.  Bindings, in turn, have
> names and
> types.  And like constructors and methods, member patterns have a _body_
> which
> is a block statement.  Member patterns also have a _match candidate_,
> which is a
> likely-implicit method parameter.
>
> ### Member patterns as inverse methods and constructors
>
> Regardless of syntax, let us remind ourselves that that deconstructors are
> the
> categorical dual to constructors (coconstructors), and pattern methods are
> the
> categorical dual to methods (comethods).  They are dual in their
> structure: a
> constructor or method takes N arguments and produces a result, the
> corresponding
> member pattern consumes a match candidate and (conditionally) produces N
> bindings.
>
> Moreover, they are semantically dual: the return value produced by
> construction
> or factory invocation is the match candidate for the corresponding member
> pattern, and the bindings produced by a member pattern are the answers to
> the
> _Pattern Question_ -- "could this object have come from an invocation of my
> dual, and if so, with what arguments."
>
> ### What do we call them?
>
> Given the significant overlap between methods and patterns, the first
> question
> about the declaration we need to settle is how to identify a member pattern
> declaration as distinct from a method or constructor declaration.  _Towards
> Member Patterns_ tried out a syntax that recognized these as _inverse_
> methods
> and constructors:
>
>     public Point(int x, int y) { ... }
>     public inverse Point(int x, int y) { ... }
>
> While this is a principled choice which clearly highlights the duality,
> and one
> that might be good for specification and verbal description, it is
> questionable
> whether this would be a great syntax for reading and writing programs.
>
> A more traditional option is to choose a "noun" (conditional) keyword,
> such as
> `pattern`, `matcher`, `extractor`, `view`, etc:
>
>     public pattern Point(int x, int y) { ... }
>
> If we are using a noun keyword to identify pattern declarations, we could
> use
> the same noun for all of them, or we could choose a different one for
> deconstruction patterns:
>
>     public deconstructor Point(int x, int y) { ... }
>
> Alternately, we could reach for a symbol to indicate that we are talking
> about
> an inverted member.  C++ fans might suggest
>
>     public ~Point(int x, int y) { ... }
>
> but this is too cryptic (it's evocative once you see it, but then it
> becomes
> less evocative as we move away from deconstructors towards instance
> patterns.)
>
> If we wish to offer finer-grained control over conditionality, we might
> additionally need a `total` / `partial` modifier, though I would prefer to
> avoid
> that.
>
> Of the keyword candidates, there is one that stands out (for good and bad)
> because it connects to something that is already in the language:
> `pattern`.  On
> the one hand, using the term `pattern` for the declaration is a slight
> abuse; on
> the other, users will immediately connect it with "ah, so that's how I
> make a
> new pattern" or "so that's what happens when I match against this pattern."
> (Lisps would resolve this tension by calling it `defpattern`.)
>
> The others (`matcher`, `view`, `extractor`, etc) are all made-up terms that
> don't connect to anything else in the language, for better or worse.  If
> we pick
> one of these, we are asking users to sort out _three_ separate new things
> in
> their heads: (use-site) patterns, (declaration-site) matchers, and the
> rules of
> how patterns and matchers are connected.  Calling them both "patterns",
> despite
> the mild abuse of terminology, ties them together in a way that recognizes
> their
> connection.
>
> My personal position: `pattern` is the strongest candidate here, despite
> some
> flaws.
>
> ### Binding lists and match candidates
>
> There are two obvious alternatives for describing the binding list and
> match
> candidate of a pattern declaration, both with their roots in the
> constructor and
> method syntax:
>
>  - Pretend that a pattern declaration is like a method with multiple
> return, and
>    put the binding list in the "return position", and make the match
> candidate
>    an ordinary parameter;
>  - Lean into the inverse relationship between constructors and methods (and
>    consistency with the use-site syntax), and put the binding list in the
>    "parameter list position". For static patterns and some instance
> patterns,
>    which need to explicitly identify the match candidate type, there are
> several
>    sub-options:
>    - Lean further into the duality, putting the match candidate type in the
>      "return position";
>    - Put the match candidate type somewhere else, where it is less likely
> to be
>      confused for a method return.
>
> The "method-like" approach might look like this:
>
> ```
> class Point {
>     // Constructor and deconstructor
>     public Point(int x, int y) { ... }
>     public pattern (int x, int y) Point(Point target) { ... }
>     ...
> }
>
> class Optional<T> {
>     // Static factory and pattern
>     public static<T> Optional<T> of(T t) { ... }
>     public static<T> pattern (T t) of(Optional<T> target) { ... }
>     ...
> }
> ```
>
> The "inverse" approach might look like:
>
> ```
> class Point {
>     // Constructor and deconstructor
>     public Point(int x, int y) { ... }
>     public pattern Point(int x, int y) { ... }
>     ...
> }
>
> class Optional<T> {
>     // Static factory and pattern (using the first sub-option)
>     public static<T> Optional<T> of(T t) { ... }
>     public static<T> pattern Optional<T> of(T t) { ... }
>     ...
> }
> ```
>
> With the "method-like" approach, the match candidate gets an explicit name
> selected by the author; with the inverse approach, we can go with a
> predefined
> name such as `that`.  (Because deconstructors do not have receivers, we
> could by
> abuse of notation arrange for the keyword `this` to refer instead to the
> match
> candidate within the body of a deconstructor.  While this might seem to
> lead to
> a more familiar notation for writing deconstructors, it would create a
> gratuitous asymmetry between the bodies of deconstruction patterns and
> those of
> other patterns.)
>
> Between these choices, nearly all the considerations favor the "inverse"
> approach:
>
>  - The "inverse" approach makes the declaration look like the use site.
> This
>    highlights that `pattern Point(int x, int y)` is what gets invoked when
> you
>    match against the pattern use `Point(int x, int y)`.  (This point is so
>    strong that we should probably just stop here.)
>  - The "inverse" members also look like their duals; the only difference
> is the
>    `pattern` keyword (and possibly the placement of the match candidate
> type).
>    This makes matched pairs much more obvious, and such matched pairs will
> be
>    critical both for future language features and for library idioms.
>  - The method-like approach is suggestive of multiple return or tuples,
> which is
>    probably helpful for the first few minutes but actually harmful in the
> long
>    term. This feature is _not_ (much as some people would like to believe)
> about
>    multiple return or tuples, and playing into this misperception will
> only make
>    it harder to truly understand.  So this suggestion ends up propping up
> the
>    wrong mental model.
>
> The main downside of the "inverse" approach is the one-time speed bump of
> the
> unfamiliarity of the inverted syntax.  (The "method-like" syntax also has
> its
> own speed bumps, it is just unfamiliar in different ways.)  But unlike the
> advantages of the inverse approach, which continue to add value forever,
> this
> speed bump is a one-time hurdle to get over.
>
> To smooth out the speed bumps of the inverse approach, we can consider
> moving
> the position of the match candidate for static and (suitable) instance
> pattern
> declarations, such as:
>
> ```
> class Optional<T> {
>     // the usual static factory
>     public static<T> Optional<T> of(T t) { ... }
>
>     // Various ways of writing the corresponding pattern
>     public static<T> pattern of(T t) for Optional<T> { ... }
>     // or ...
>     public static<T> pattern(Optional<T>) of(T t) { ... }
>     // or ...
>     public static<T> pattern(Optional<T> that) of(T t) { ... }
>     // or ...
>     public static<T> pattern<Optional<T>> of(T t) { ... }
>     ...
> }
> ```
>
> (The deconstructor example looks the same with either variant.)  Of these,
> treating the match candidate like a "parameter" of "pattern" is probably
> the
> most evocative:
>
> ```
> public static<T> pattern(Optional<T> that) of(T t) { ... }
> ```
>
> as it can be read as "pattern taking the parameter `Optional<T> that`
> called
> `of`, binding `T`, and is a short departure from the inverse syntax.
>
> The main value of the various rearrangements is that users don't need to
> think
> about things operating in reverse to parse the syntax.  This trades some
> of the
> secondary point (patterns looking almost exactly like their inverses) for a
> certain amount of cognitive load, while maintaining the most important
> consideration: that the declaration site look like the use site.
>
> For instance pattern declarations, if the match candidate type is the same
> as
> the receiver type, the match candidate type can be elided as it is with
> deconstructors.
>
> My personal position: the "multiple return" version is terrible; all the
> sub-variants of the inverse version are probably workable.
>
> ### Naming the match candidate
>
> We've been assuming so far that the match candidate always has a fixed
> name,
> such as `that`; this is an entirely workable approach.  Some of the
> variants are
> also amenable to allowing authors to explicitly select a name for the match
> candidate.  For example, if we put the match candidate as a "parameter" to
> the `pattern` keyword, there is an obvious place to put the name:
>
> ```
> static<T> pattern(Optional<T> target) of(T t) { ... }
> ```
>
> My personal opinion: I don't think this degree of freedom buys us much,
> and in
> the long run readability probably benefits by picking a fixed name like
> `that`
> and sticking with it.  Even with a fixed name, if there is a sensible
> position
> for the name, allowing users to type `that` for explicitness is fine (as
> we do
> with instance methods, though many people don't know this.)  We may even
> want to
> require it.
>
> ## Body types
>
> Just as there are two obvious approaches for the declaration, there are two
> obvious approaches we could take for the body (though there is some
> coupling
> between them.)  We'll call the two body approaches _imperative_ and
> _functional_.
>
> The imperative approach treats bindings as initially-DU variables that
> must be
> DA on successful completion, getting their value through ordinary
> assignment;
> the functional approach sets all the bindings at once, positionally.
> Either
> way, member patterns (except maybe deconstructors) also need a way to
> differentiate a successful match from a failed match.
>
> Here is the `Point` deconstructor with both imperative and functional
> style. The
> functional style uses a placeholder `match` statement to indicate a
> successful
> match and provision of bindings:
>
> ```
> class Point {
>     int x, y;
>
>     Point(int x, int y) {
>         this.x = x;
>         this.y = y;
>     }
>
>     // Imperative style, deconstructor always succeeds
>     pattern Point(int x, int y) {
>         x = that.x;
>         y = that.y;
>     }
>
>     // Functional style
>     pattern Point(int x, int y) {
>         match(that.x, that.y);
>     }
> }
> ```
>
> There are some obvious differences here.  In the imperative style, the
> dtor body
> looks much more like the reverse of the ctor body. The functional style is
> more
> concise (and amenable to further concision via the "concise method bodies"
> mechanism in the future), as well as a number of less obvious
> differences.  For
> deconstructors, the imperative approach is likely to feel more natural
> because
> of the obvious symmetry with constructors.
>
> In reality, it is _premature at this point to have an opinion_, because we
> haven't yet seen the full scope of the problem; deconstructors are a
> special
> case in many ways, which almost surely is distorting our initial opinion.
> As we
> move towards conditional patterns (and pattern lambdas), our opinions may
> flip.
>
> Regardless of which we pick, there are some additional syntactic choices
> to be
> made -- what syntax to use to indicate success (we used `match` in the
> above
> example) or failure.  (We should be especially careful around trying to
> reuse
> words like `return`, `break`, or `yield` because, in the case where there
> are
> zero bindings (which is allowable), it becomes unclear whether they mean
> "fail"
> or "succeed with zero bindings".)
>
> ### Success and failure
>
> Except for possibly deconstructors, which we may require to be total, a
> pattern
> declaration needs a way to indicate success and failure.  In the examples
> above,
> we posited a `match` statement to indicate success in the functional
> approach,
> and in both examples leaned on the "implicit success" of deconstructors
> (under
> the assumption they always succeed).  Now let's look at the more general
> case to
> figure out what else is needed.
>
> For a static pattern like `Optional::of`, success is conditional.  Using
> `match-fail` as a placeholder for "the match failed", this might look like
> (functional version):
>
> ```
> public static<T> pattern(Optional<T> that) of(T t) {
>     if (that.isPresent())
>         match (that.get());
>     else
>         match-fail;
> }
> ```
>
> The imperative version is less pretty, though.  Using `match-success` as a
> placeholder:
>
> ```
> public static<T> pattern(Optional<T> that) of(T t) {
>     if (that.isPresent()) {
>         t = that.get();
>         match-success;
>     }
>     else
>         match-fail;
> }
> ```
>
> Both arms of the `if` feel excessively ceremonial here.  And if we chose
> to not
> make all deconstruction patterns unconditional, deconstructors would
> likely need
> some explicit success as well:
>
> ```
> pattern Point(int x, int y) {
>     x = that.x;
>     y = that.y;
>     match-success;
> }
> ```
>
> It might be tempting to try and eliminate the need for explicit success by
> inferring it from whether or not the bindings are DA or not, but this is
> error-prone, is less type-checkable, and falls apart completely for
> patterns
> with no bindings.
>
> ### Implicit failure in the functional approach
>
> One of the ceremonial-seeming aspects of `Optional::of` above is having to
> say
> `else match-fail`, which doesn't feel like it adds a lot of value.
> Perhaps we
> can be more concise without losing clarity.
>
> Most conditional patterns will have a predicate to determine matching, and
> then
> some conditional code to compute the bindings and claim success.  Having
> to say
> "and if the predicate didn't hold, then I fail" seems like ceremony for the
> author and noise for the reader.  Instead, if a conditional pattern falls
> off
> the end without matching, we could treat that as simply not matching:
>
> ```
> public static<T> pattern(Optional<T> that) of(T t) {
>     if (that.isPresent())
>         match (that.get());
> }
> ```
>
> This says what we mean: if the optional is present, then this pattern
> succeeds
> and bind the contents of the `Optional`.  As long as our "succeed"
> construct
> strongly enough connotes that we are terminating abruptly and
> successfully, this
> code is perfectly clear.  And most conditional patterns will look a lot
> like
> `Optional::of`; do some sort of test and if it succeeds, extract the state
> and
> bind it.
>
> At first glance, this "implicit fail" idiom may seem error-prone or
> sloppy.  But
> after writing a few dozen patterns, one quickly tires of saying "else
> match-fail" -- and the reader doesn't necessarily appreciate reading it
> either.
>
> Implicit failure also simplifies the selection of how we explicitly
> indicate
> failure; using `return` in a pattern for "no match" becomes pretty much a
> forced
> move.  We observe that (in a void method), "return" and "falling off the
> end"
> are equivalent; if "falling off the end" means "no match", then so should
> an
> explicit `return`.  So in those few cases where we need to explicitly
> signal "no
> match", we can just use `return`.  It won't come up that often, but here's
> an
> example where it does:
>
> ```
> static pattern(int that) powerOfTwo(int exp) {
>     int exp = 0;
>
>     if (that < 1)
>         return; // explicit fail
>
>     while (that > 1) {
>         if (that % 2 == 0) {
>             that /= 2;
>             ++exp;
>         }
>         else
>             return; // explicit fail
>     }
>     match (exp);
> }
> ```
>
> As a bonus, if `return` as match failure is a forced move, we need only
> select a
> term for "successful match" (which obviously can't be `return`).  We could
> use
> `match` as we have in the examples, or a variant like `matched` or
> `matches`.
> But rather than just creating a new control operator, we have an
> opportunity to
> lean into the duality a little harder, by including the pattern syntax in
> the
> match:
>
> ```
> matches of(that.get());
> ```
>
> or the (optionally?) qualified (inferring type arguments, as we do at the
> use
> site):
>
> ```
> matches Optional.of(that.get());
> ```
>
> These "use the name" approaches trades a small amount of verbosity to gain
> a
> higher degree of fidelity to the pattern use site (and to evoke the
> comethod
> completion.)
>
> If we don't choose "implicit fail", we would have to invent _two_ new
> control
> flow statements to indicate "success" and "failure".
>
> My personal position: for the functional approach, implicit failure both
> makes
> the code simpler and clearer, and after you get used to it, you don't want
> to go
> back.  Whether we say `match` or `matches` or `matches <pattern-name>` are
> all
> workable, though I like some variant that names the pattern.
>
> ### Implicit success in the imperative approach
>
> In the imperative approach, we can be implicit as well, but it feels more
> natural (at least, initially) to choose implicit success rather than
> failure.
> This works great for unconditional patterns:
>
> ```
> pattern Point(int x, int y) {
>     x = that.x;
>     y = that.y;
>     // implicit success
> }
> ```
>
> but not quite as well for conditional patterns:
>
> ```
> static<T> pattern(Optional<T> that) of(T t) {
>     if (that.isPresent()) {
>         t = that.get();
>     }
>     else
>         match-fail;
>     // implicit success
> }
> ```
>
> We can eliminate one of the arms of the if, with the more concise (but
> convoluted) inversion:
>
> ```
> static<T> pattern(Optional<T> that) of(T t) {
>     if (!that.isPresent())
>         match-fail;
>     t = that.get();
>     // implicit success
> }
> ```
>
> Just as with the functional approach, if we choose imperative and "implicit
> success", using `return` to indicate success is pretty much a forced move.
>
>
> ### Imperative is a trap
>
> If we assume that functional implies implicit failure, and imperative
> implies
> implicit success, then our choices become:
>
> ```
> class Optional<T> {
>     public static<T> Optional<T> of(T t) { ... }
>
>     // imperative, implicit success
>     public static<T> pattern(Optional<T> that) of(T t) {
>         if (that.isPresent()) {
>             t = that.get();
>         }
>         else
>             match-fail;
>     }
>
>     // functional, implicit failure
>     public static<T> pattern(Optional<T> that) of(T t) {
>         if (that.isPresent())
>             matches of(that.get());
>     }
> }
> ```
>
> Once we get past deconstructors, the imperative approach looks worse by
> comparison because we need to assign all the bindings (which is _O(n)_
> assignments) _and also_ indicate success or failure somehow, whereas in the
> functional style all can be done together with a single `matches`
> statement.
>
> Looking at the alternatives, except maybe for unconditional patterns, the
> functional example above seems a lot more natural.  The imperative approach
> works with deconstructors (assuming they are not conditional), but does not
> scale so well to conditionality -- which is the essence of patterns.
>
> From a theoretical perspective, the method-comethod duality also gives us a
> forceful nudge towards the functional approach.  In a method, the method
> arguments are specified as a positional list of expressions at the use
> site:
>
>     m(a, b, c)
>
> and these values are invisibly copied into the parameter slots of the
> method
> prior to frame activation.  The dual to that for a comethod to similarly
> convey
> the bindings in a positional list of expressions (as they must either all
> be
> produced or none), where they are copied into the slots provided at the use
> site, as is indicated by `matches` in the above examples.
>
> My personal position: the imperative style feels like a trap.  It seems
> "obvious" at first if we start with deconstructors, but becomes
> increasingly
> difficult when we get past this case, and gets in the way of other
> opportunities.  The last gasp before acceptance is the discomfort that
> dtor and
> ctor bodies are written in different styles, but in the rear-view mirror,
> this
> feels like a non-issue.
>
> ### Derive imperative from functional?
>
> If we start with "functional with implicit failure", we can possibly rescue
> imperative by deriving a version of imperative from functional, by
> "overloading"
> the match-success operator.
>
> If we have a pattern whose binding names are `b1..bn` of types `B1..Bn`,
> then
> the `matches` operator must take a list of expressions `e1..en` whose
> arity and
> types are compatible with `B1..Bn`.  But we could allow `matches` to also
> have a
> nilary form, which would have the effect of being shorthand for
>
>     matches <pattern-name>(b1, b2, ..., bn)
>
> where each of `b1..bn` must be DA at the point of matching.  This means
> that we
> could express patterns in either form:
>
> ```
> class Optional<T> {
>     public static<T> Optional<T> of(T t) { ... }
>
>     // imperative, derived from functional with implicit failure
>     public static<T> pattern(Optional<T> that) of(T t) {
>         if (that.isPresent()) {
>             t = that.get();
>             matches of;
>         }
>     }
>
>     public static<T> pattern(Optional<T> that) of(T t) {
>         if (that.isPresent())
>             matches of(that.get());
>     }
> }
> ```
>
> This flexibility allows users to select a more verbose expression in
> exchange
> for a clearer association of expressions and bindings, though as we'll
> see, it
> does come with some additional constraints.
>
> ### Wrapping an existing API
>
> Nearly every library has methods (sometimes sets of methods) that are
> patterns
> in disguise, such as the pair of methods `isArray` and `getComponentType`
> in
> `Class`, or the `Matcher` helper type in `java.util.regex`.  Library
> maintainers
> will likely want to wrap (or replace) these with real patterns, so these
> can
> participate more effectively in conditional contexts, and in some cases,
> highlight their duality with factory methods.
>
> Matching a string against a `j.u.r.Pattern` regular expression has all the
> same
> elements as a pattern, just with an ad-hoc API (and one that I have to
> look up
> every time).  But we can fairly easily wrap a true pattern around the
> existing
> API.  To match against a `Pattern` today, we pass the match candidate to
> `Pattern::matcher`, which returns a `Matcher` with accessors
> `Matcher::matches`
> (did it match) and `Matcher::group` (conditionally extract a particular
> capture
> group.)  If we want to wrap this with a pattern called `regexMatch`:
>
> ```
> pattern(String that) regexMatch(String... groups) {
>     Matcher m = this.matcher(that);
>     if (m.matches())
>         matches Pattern.regexMatch(IntStream.range(1, m.groupCount())
>                                             .map(Matcher::group)
>                                             .toArray(String[]::new));
>     // whole lotta matchin' goin' on
> }
> ```
>
> This says that a `j.u.r.Pattern` has an instance pattern called `regex`,
> whose
> match candidate is `String`, and which binds a varargs of `String`
> corresponding
> to the capture groups.  The implementation simply delegates to the existing
> `j.u.r.Matcher` API.  This means that `j.u.r.Pattern` becomes a sort of
> "pattern
> object", and we can use it as a receiver at the use site:
>
> ```
> static Pattern As = Pattern.compile("(a*)");
> static Pattern Bs = Pattern.compile("(b*)");
> ...
> switch (string) {
>     case As.regexMatch(var as): ...
>     case Bs.regexMatch(var bs): ...
>     ...
> }
> ```
>
> ### Odds and ends
>
> There are a number of loose ends here.  We could choose other names for the
> match-success and match-fail operations, including trying to reuse `break`
> or
> `yield`.  But, this reuse is tricky; it must be very clear whether a given
> form
> of abrupt completion means "success" or "failure", because in the case of
> patterns with no bindings, we will have no other syntactic cues to help
> disambiguate.  (I think having a single `matches`, with implicit failure
> and
> `return` meaning failure, is the sweet spot here.)
>
> Another question is whether the binding list introduces corresponding
> variables
> into the scope of the body.  For imperative, the answer is "surely yes";
> for
> functional, the answer is "maybe" (unless we want to do the trick where we
> derive imperative from functional, in which case the answer is "yes"
> again.)
>
> If the binding list does not correspond to variables in the body, this may
> be
> initially discomforting; because they do not declare program elements,
> they may
> feel that they are left "dangling".  But even if they are not declaring
> _program_ elements, they are still declaring _API_ elements (similar to the
> return type of a method.)  We will want to provide Javadoc on the
> bindings, just
> like with parameters; we will want to match up binding names in
> deconstructors
> with parameter names in constructors; we may even someday want to support
> by-name binding at the use site (e.g., `case Foo(a: var a)`).  The names
> are
> needed for all of these, just not for the body. Names still matter.  My
> take
> here is that this is a transient "different is scary" reaction, one that we
> would get over quickly.
>
> A final question is whether we should consider unqualified names as
> implicitly
> qualified by `that` (and also `this`, for instance patterns, with some
> conflict
> resolution).  Users will probably grow tired of typing `that.` all the
> time, and most of the time, the unqualified use is perfectly readable.
>
> ## Exhaustiveness
>
> There is one last syntax question in front of us: how to indicate that a
> set of
> patterns are (claimed to be) exhaustive on a given match candidate type.
> We see
> this with `Optional::of` and `Optional::empty`; it would be sad if the
> compiler
> did not realize that these two patterns together were exhaustive on
> `Optional`.
> This is not a feature that will be used often, but not having it at all
> will be
> a repeated irritant.
>
> The best I've come up with is to call these `case` patterns, where a set of
> `case` patterns for a given match candidate type in a given class are
> asserted
> to be an exhaustive set:
>
> ```
> class Optional<T> {
>     static<T> Optional<T> of(T t) { ... }
>     static<T> Optional<T> empty() { ... }
>
>     static<T> case pattern of(T t) for Optional<T> { ... }
>     static<T> case pattern empty() for Optional<T> { ... }
> }
> ```
>
> Because they may not be truly exhaustive, `switch` constructs will have to
> back
> up the static assumption of exhaustiveness with a dynamic check, as we do
> for
> other sets of exhaustive patterns that may have remainder.
>
> I've experimented with variants of `sealed` but it felt more forced, so
> this is
> the best I've come up with.
>
> ## Example: patterns delegating to other patterns
>
> Pattern implementations must compose.  Just as a subclass constructor
> delegates
> to a superclass constructor, the same should be true for deconstructors.
> Here's a typical superclass-subclass pair:
>
> ```
> class A {
>     private final int a;
>
>     public A(int a) { this.a = a; }
>     public pattern A(int a) { matches A(that.a); }
> }
>
> class B extends A {
>     private final int b;
>
>     public B(int a, int b) {
>         super(a);
>         this.b = b;
>     }
>
>     // Imperative style
>     public pattern B(int a, int b) {
>         if (that instanceof super(var aa)) {
>             a = aa;
>             b = that.b;
>             matches B;
>         }
>     }
>
>     // Functional style
>     public pattern B(int a, int b) {
>         if (that instanceof super(var a))
>             matches B(a, b);
>     }
> }
> ```
>
> (Ignore the flow analysis and totality for the time being; we'll come back
> to
> this in a separate document.)
>
> The first thing that jumps out at us is that, in the imperative version,
> we had
> to create a "garbage" variable `aa` to receive the binding, because `a` was
> already in scope, and then we have to copy the garbage variable into the
> real
> binding variable. Users will surely balk at this, and rightly so.  In the
> functional version (depending on the choices from "Odds and Ends") we are
> free
> to use the more natural name and avoid the roundabout locution.
>
> We might be tempted to fix the "garbage variable" problem by inventing
> another
> sub-feature: the ability to use an existing variable as the target of a
> binding,
> such as:
>
> ```
> pattern Point(int a, int b) {
>     if (this instanceof A(__bind a))
>         b = this.b;
> }
> ```
>
> But, I think the language is stronger without this feature, for two
> reasons.
> First, having to reason about whether a pattern match introduces a new
> binding
> or assigns to an existing variables is additional cognitive load for users
> to
> reason about, and second, having assignment to locals happening through
> something other than assignment introduces additional complexity in finding
> where a variable is modified.  While we can argue about the general
> utility of
> this feature, bringing it in just to solve the garbage-variable problem is
> particularly unattractive.
>
> ## Pattern lambdas
>
> One final consideration is is that patterns may also have a lambda form.
> Given
> a single-abstract-pattern (SAP) interface:
>
> ```
> interface Converter<T,U> {
>     pattern(T t) convert(U u);
> }
> ```
>
> one can implement such a pattern with a lambda. Such a lambda has one
> parameter
> (the match candidate), and its body looks like the body of a declared
> pattern:
>
> ```
> Converter<Integer, Short> c =
>     i -> {
>         if (i >= Short.MIN_VALUE && i <= Short.MAX_VALUE)
>             matches Converter.convert((short) i);
>     };
> ```
>
> Because the bindings of the pattern lambda are defined in the interface,
> not in
> the lambda, this is one more reason not to like the imperative version: it
> is
> brittle, and alpha-renaming bindings in the interface would be a
> source-incompatible change.
>
> ## Example gallery
>
> Here's all the pattern examples so far, and a few more, using the suggested
> style (functional, implicit fail, implicit `that`-qualification):
>
> ```
> // Point dtor
> pattern Point(int x, int y) {
>     matches Point(x, y);
> }
>
> // Optional -- static patterns for Optional::of, Optional::empty
> static<T> case pattern(Optional<T> that) of(T t) {
>     if (isPresent())
>         matches of(t);
> }
>
> static<T> case pattern(Optional<T> that) empty() {
>     if (!isPresent())
>         matches empty();
> }
>
> // Class -- instance pattern for arrayClass (match candidate type inferred)
> pattern arrayClass(Class<?> componentType) {
>     if (that.isArray())
>         matches arrayClass(that.getComponentType());
> }
>
> // regular expression -- instance pattern in j.u.r.Pattern
> pattern(String that) regexMatch(String... groups) {
>     Matcher m = matcher(that);
>     if (m.matches())
>         matches Pattern.regexMatch(IntStream.range(1, m.groupCount())
>                                             .map(Matcher::group)
>                                             .toArray(String[]::new));
> }
>
> // power of two (somewhere)
> static pattern(int that) powerOfTwo(int exp) {
>     int exp = 0;
>
>     if (that < 1)
>         return;
>
>     while (that > 1) {
>         if (that % 2 == 0) {
>             that /= 2;
>             exp++;
>         }
>         else
>             return;
>     }
>     matches powerOfTwo(exp);
> }
> ```
>
> ## Closing thoughts
>
> I came out of this exploration with very different conclusions than I
> expected
> when going in.  At first, the "inverse" syntax seemed stilted, but over
> time it
> started to seem more obvious.  Similarly, I went in expecting to prefer the
> imperative approach for the body, but over time, started to warm to the
> functional approach, and eventually concluded it was basically a forced
> move if
> we want to support more than just deconstructors.  And I started out
> skeptical
> of "implicit fail", but after writing a few dozen patterns with it, going
> back
> to fully explicit felt painful.  All of this is to say, you should hold
> your
> initial opinions at arm's length, and give the alternatives a chance to
> sink in.
>
> For most _conditional_ patterns (and conditionality is at the heart of
> pattern
> matching), the functional approach cleanly highlights both the match
> predicate
> and the flow of values, and is considerably less fussy than the imperative
> approach in the same situation; `Optional::of`, `Class::arrayClass`, and
> `regex`
> look great here, much better than the would with imperative.  None of these
> illustrate delegation, but in the presence of delegation, the gap gets even
> wider.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20240330/f7f7c029/attachment-0001.htm>


More information about the amber-spec-observers mailing list