Member Patterns -- the bikeshed

Brian Goetz brian.goetz at oracle.com
Sat Mar 30 20:53:14 UTC 2024



On 3/30/2024 3:23 PM, Victor Nazarov wrote:
> I have two points that I think may be good to consider in the list of 
> options.
>
> 1. I'm not sure if this was considered, but I find explicit lists of 
> covering patterns
> rather natural and more flexible than using case as a pattern-modifier.

Agreed (this is how F# does it), and we tried that, but it is so 
contrary to how members are done in Java.   (One might think that one 
could declare a "sealed" pattern, which "permits" a list of other 
patterns, and this sounds perfectly natural, but it looks pretty weird.)

> The important feature of explicit lists is that there may be more than 
> one covering set of patterns.

Yes, been down this road too, but the reality is that this is not likely 
to come up nearly as often as one might imagine.

> 2. I think that there is a middle ground between functional and 
> imperative pattern body definition style that may look cumbersome at 
> first, but nevertheless gives you best of both worlds:

The `match` block is an interesting idea, will consider.


>
>     * deconstructor patterns look dual to constructors
>     * names from the list of pattern variables are actually used and 
> checked by the compiler
>     * control flow is still functional, which is more natural
>
> The downside that is retained from the imperative style is the need 
> for alpha-renaming,
> but I think we still have to deal with shadowing and renaming 
> local-variable seems natural and easy.
>
> Middle ground may be used like a special form that can be used in the 
> pattern body.
> This form works mostly the same way as `with`-clause as defined in the 
> "Derived Record Instances" JEP.
>
> Here is the long list of examples to fully illustrate different 
> interactions:
>
> ````
>     class Optional<T> matches (of|empty) {
>         public static <T> pattern<Optional<T>> of(T value) {
>             if (that.isPresent()) {
>                 match {
>                     value = that.get();
>                 }
>             }
>         }
>
>         public static <T> pattern<Optional<T>> empty() {
>             if (that.isEmpty())
>                 match {}
>         }
>     }
>
>     class Pattern {
>         public pattern<String> regexMatch(String... groups) {
>             Matcher m = this.matcher(that);
>             if (m.matches()) {
>                 match {
>                     groups =
>                             IntStream.range(1, m.groupCount())
>                                     .map(Matcher::group)
>                                     .toArray(String[]::new);
>                 }
>             }
>         }
>     }
>
>     class A {
>         private final int a;
>
>         public A(int a) {
>             this.a = a;
>         }
>         public pattern A(int a) {
>             match {
>                 a = that.a;
>             }
>         }
>     }
>
>     class B extends A {
>         private final int b;
>
>         public B(int a, int b) {
>             super(a);
>             this.b = b;
>         }
>
>         public pattern B(int a, int b) {
>             if (that instanceof super(var aa)) {
>                 match {
>                     a = aa;
>                     b = that.b;
>                 }
>             }
>         }
>     }
>
>     interface Converter<T,U> {
>         pattern<T> convert(U u);
>     }
>     Converter<Integer, Short> c =
>         pattern (s) -> {
>             if (that >= Short.MIN_VALUE && that <= Short.MAX_VALUE)
>                 match {
>                     s = (short) that;
>                 }
>         };
> ````
>
> --
> Victor Nazarov
>
>
> On Fri, Mar 29, 2024 at 10:59 PM Brian Goetz <brian.goetz at oracle.com> 
> wrote:
>
>     We now come to the long-awaited bikeshed discussion on what member
>     patterns should look like.
>
>     Bikeshed disclaimer for EG:
>       - This is likely to evoke strong opinions, so please take pains
>     to be especially constructive
>       - Long reply-to-reply threads should be avoided even more than usual
>       - Holistic, considered replies preferred
>       - Please change subject line if commenting on a sub-topic or
>     tangential
>         concern
>
>     Special reminders for Remi:
>      - Use of words like "should", "must", "shouldn't", "mistake",
>     "wrong", "broken"
>        are strictly forbidden.
>      - If in doubt, ask questions first.
>
>     Notes for external observers:
>      - This is a working document for the EG; the discussion may
>     continue for a
>        while before there is an official proposal.  Please be patient.
>
>
>     # Pattern declaration: the bikeshed
>
>     We've largely identified the model for what kinds of patterns we
>     need to
>     express, but there are still several degrees of freedom in the syntax.
>
>     As the model has simplified during the design process, the space
>     of syntax
>     choices has been pruned back, which is a good thing. However,
>     there are still
>     quite a few smaller decisions to be made.  Not all of the
>     considerations are
>     orthogonal, so while they are presented individually, this is not
>     a "pick one
>     from each column" menu.
>
>     Some of these simplifications include:
>
>      - Patterns with "input arguments" have been removed; another way
>     to get to what
>        this gave us may come back in another form.
>      - I have grown increasingly skeptical of the value of the
>     imperative `match`
>        statement.  With better totality analysis, I think it can be
>     eliminated.
>
>     We can discuss these separately but I would like to sync first on
>     the broad
>     strokes for how patterns are expressed.
>
>     ## Object model requirements
>
>     As outlined in "Towards Member Patterns", the basic model is that
>     patterns are
>     the dual of other executable members (constructors, static
>     methods, instance
>     methods.)  While they are like methods in that they have inputs,
>     outputs, names,
>     and an imperative body, they have additional degrees of freedom that
>     constructors and methods lack:
>
>      - Patterns are, in general, _conditional_ (they can succeed or
>     fail), and only
>        produce bindings (outputs) when they succeed.  This
>     conditionality is
>        understood by the language's flow analysis, and is used for
>     computing scoping
>        and definite assignment.
>      - Methods can return at most one value; when a pattern completes
>     successfully,
>        it may bind multiple values.
>      - All patterns have a _match candidate_, which is a distinguished,
>        possibly-implicit parameter.  Some patterns also have a
>     receiver, which is
>        also a distinguished, possibly-implicit parameter.  In some
>     such cases the
>        receiver and match candidate are aliased, but in others these
>     may refer to
>        different objects.
>
>     So a pattern is a named executable member that takes a _match
>     candidate_ as a
>     possibly-implicit parameter, maybe takes a receiver as an implicit
>     parameter,
>     and has zero or more conditional _bindings_.  Its body can perform
>     imperative
>     computation, and can terminate either with match failure or
>     success.  In the
>     success case, it must provide a value for each binding.
>
>     Deconstruction patterns are special in many of the same ways
>     constructors are:
>     they are constrained in their name, inheritance, and probably their
>     conditionality (they should probably always succeed). Just as the
>     syntax for
>     constructors differs slightly from that of instance methods, the
>     syntax for
>     deconstructors may differ slightly from that of instance
>     patterns.  Static
>     patterns, like static methods, have no receiver and do not have
>     access to the
>     type parameters of the enclosing class.
>
>     Like constructors and methods, patterns can be overloaded, but in
>     accordance
>     with their duality to constructors and methods, the overloading
>     happens on the
>     _bindings_, not the inputs.
>
>     ## Use-site syntax
>
>     There are several kinds of type-driven patterns built into the
>     language: type
>     patterns and record patterns.  A type pattern in a `switch` looks
>     like:
>
>         case String s: ...
>
>     And a record pattern looks like:
>
>         case MyRecord(P1, P2, ...): ...
>
>     where `P1..Pn` are nested patterns that are recursively matched to the
>     components of the record.  This use-site syntax for record
>     patterns was chosen
>     for its similarity to the construction syntax, to highlight that a
>     record
>     pattern is the dual of record construction.
>
>     **Deconstruction patterns.**  The simplest kind of member pattern, a
>     deconstruction pattern, will have the same use-site syntax as a
>     record pattern;
>     record patterns can be thought of as a deconstruction pattern
>     "acquired for
>     free" by records, just as records do with constructors, accessors,
>     object
>     methods, etc.  So the use of a deconstruction pattern for `Point`
>     looks like:
>
>         case Point(var x, var y): ...
>
>     whether `Point` is a record or an ordinary class equipped with a
>     suitable
>     deconstruction pattern.
>
>     **Static patterns.**  Continuing with the idea that the
>     destructuring syntax
>     should evoke the aggregation syntax, there is an obvious candidate
>     for the
>     use-site syntax for static patterns:
>
>         case Optional.of(var e): ...
>         case Optional.empty(): ...
>
>     **Instance patterns.**  Uses of instance patterns will likely come
>     in two forms,
>     analogous to bound and unbound instance method references,
>     depending on whether
>     the receiver and the match candidate are the same object. In the
>     unbound form,
>     used when the receiver is the same object as the match candidate,
>     the pattern
>     name is qualified by a _type_:
>
>     ```
>     Class<?> k = ...
>     switch (k) {
>         // Qualified by type
>         case Class.arrayClass(var componentType): ...
>     }
>     ```
>
>     This means that we _resolve_ the pattern `arrayClass` starting at
>     `Class` and
>     _select_ the pattern using the receiver, `k`.  We may also be able
>     to omit the
>     class qualifier if the static type of the match candidate is
>     sufficient to
>     resolve the desired pattern.
>
>     In the bound form, used when the receiver is distinct from the
>     match candidate,
>     the pattern name is qualified with an explicit _receiver
>     expression_.  As an
>     example, consider an interface that captures primitive widening
>     and narrowing
>     conversions, such as those between `int` and `long`.  In the
>     widening direction,
>     conversion is unconditional, so this can be modeled as a method
>     from `int` to
>     `long`.  In the other direction, conversion is conditional, so
>     this is better
>     modeled as a _pattern_ whose match candidate is `long` and which
>     binds an `int`
>     on success.  Since these are instance methods of some class (say,
>     `NumericConversion<T,U>`), we need to provide the receiver
>     instance in order to
>     resolve the pattern:
>
>     ```
>     NumericConversion<int, long> nc = ...
>
>     switch (aLong) {
>         case nc.narrowed(int i):
>         ...
>     }
>     ```
>
>     The explicit receiver syntax would also be used if we exposed
>     regular expression
>     matching as a pattern on the `j.u.r.Pattern` object (the name
>     collision on
>     `Pattern` is unfortunate).  Imagine we added a `matching` instance
>     pattern to
>     `j.u.r.Pattern`; then we could use it in `instanceof` as follows:
>
>     ```
>     static final java.util.regex.Pattern P = Pattern.compile("(a*)(b*)");
>     ...
>     if (aString instanceof P.matching(String as, String bs)) { ... }
>     ```
>
>     Each of these use-site syntaxes is modeled after the use-site
>     syntax for a
>     method invocation or method reference.
>
>     ## Declaration-site syntax
>
>     To avoid being biased by the simpler cases, we're going to work
>     all the cases
>     concurrently rather than starting with the simpler cases and
>     working up.  (It
>     might seem sensible to start with deconstructors, since they are
>     the "easy"
>     case, but if we did that, we would likely be biased by their
>     simplicity and then
>     find ourselves painted into a corner.)  As our example gallery, we
>     will consider:
>
>      - Deconstruction pattern for `Point`;
>      - Static patterns for `Optional::of` and `Optional::empty`;
>      - Static pattern for "power of two" (illustrating a computations
>     where success
>        or failure, and computation of bindings, cannot easily be
>     separated);
>      - Instance pattern for `Class::arrayClass` (used unbound);
>      - Instance pattern for `Pattern::matching` on regular expressions
>     (used bound).
>
>     Member patterns, like methods, have _names_.  (We can think of
>     constructors as
>     being named for their enclosing classes, and the same for
>     deconstructors.)  All
>     member patterns have a (possibly empty) ordered list of
>     _bindings_, which are
>     the dual of constructor or method parameters.  Bindings, in turn,
>     have names and
>     types.  And like constructors and methods, member patterns have a
>     _body_ which
>     is a block statement.  Member patterns also have a _match
>     candidate_, which is a
>     likely-implicit method parameter.
>
>     ### Member patterns as inverse methods and constructors
>
>     Regardless of syntax, let us remind ourselves that that
>     deconstructors are the
>     categorical dual to constructors (coconstructors), and pattern
>     methods are the
>     categorical dual to methods (comethods).  They are dual in their
>     structure: a
>     constructor or method takes N arguments and produces a result, the
>     corresponding
>     member pattern consumes a match candidate and (conditionally)
>     produces N
>     bindings.
>
>     Moreover, they are semantically dual: the return value produced by
>     construction
>     or factory invocation is the match candidate for the corresponding
>     member
>     pattern, and the bindings produced by a member pattern are the
>     answers to the
>     _Pattern Question_ -- "could this object have come from an
>     invocation of my
>     dual, and if so, with what arguments."
>
>     ### What do we call them?
>
>     Given the significant overlap between methods and patterns, the
>     first question
>     about the declaration we need to settle is how to identify a
>     member pattern
>     declaration as distinct from a method or constructor declaration. 
>     _Towards
>     Member Patterns_ tried out a syntax that recognized these as
>     _inverse_ methods
>     and constructors:
>
>         public Point(int x, int y) { ... }
>         public inverse Point(int x, int y) { ... }
>
>     While this is a principled choice which clearly highlights the
>     duality, and one
>     that might be good for specification and verbal description, it is
>     questionable
>     whether this would be a great syntax for reading and writing
>     programs.
>
>     A more traditional option is to choose a "noun" (conditional)
>     keyword, such as
>     `pattern`, `matcher`, `extractor`, `view`, etc:
>
>         public pattern Point(int x, int y) { ... }
>
>     If we are using a noun keyword to identify pattern declarations,
>     we could use
>     the same noun for all of them, or we could choose a different one for
>     deconstruction patterns:
>
>         public deconstructor Point(int x, int y) { ... }
>
>     Alternately, we could reach for a symbol to indicate that we are
>     talking about
>     an inverted member.  C++ fans might suggest
>
>         public ~Point(int x, int y) { ... }
>
>     but this is too cryptic (it's evocative once you see it, but then
>     it becomes
>     less evocative as we move away from deconstructors towards
>     instance patterns.)
>
>     If we wish to offer finer-grained control over conditionality, we
>     might
>     additionally need a `total` / `partial` modifier, though I would
>     prefer to avoid
>     that.
>
>     Of the keyword candidates, there is one that stands out (for good
>     and bad)
>     because it connects to something that is already in the language:
>     `pattern`.  On
>     the one hand, using the term `pattern` for the declaration is a
>     slight abuse; on
>     the other, users will immediately connect it with "ah, so that's
>     how I make a
>     new pattern" or "so that's what happens when I match against this
>     pattern."
>     (Lisps would resolve this tension by calling it `defpattern`.)
>
>     The others (`matcher`, `view`, `extractor`, etc) are all made-up
>     terms that
>     don't connect to anything else in the language, for better or
>     worse.  If we pick
>     one of these, we are asking users to sort out _three_ separate new
>     things in
>     their heads: (use-site) patterns, (declaration-site) matchers, and
>     the rules of
>     how patterns and matchers are connected.  Calling them both
>     "patterns", despite
>     the mild abuse of terminology, ties them together in a way that
>     recognizes their
>     connection.
>
>     My personal position: `pattern` is the strongest candidate here,
>     despite some
>     flaws.
>
>     ### Binding lists and match candidates
>
>     There are two obvious alternatives for describing the binding list
>     and match
>     candidate of a pattern declaration, both with their roots in the
>     constructor and
>     method syntax:
>
>      - Pretend that a pattern declaration is like a method with
>     multiple return, and
>        put the binding list in the "return position", and make the
>     match candidate
>        an ordinary parameter;
>      - Lean into the inverse relationship between constructors and
>     methods (and
>        consistency with the use-site syntax), and put the binding list
>     in the
>        "parameter list position". For static patterns and some
>     instance patterns,
>        which need to explicitly identify the match candidate type,
>     there are several
>        sub-options:
>        - Lean further into the duality, putting the match candidate
>     type in the
>          "return position";
>        - Put the match candidate type somewhere else, where it is less
>     likely to be
>          confused for a method return.
>
>     The "method-like" approach might look like this:
>
>     ```
>     class Point {
>         // Constructor and deconstructor
>         public Point(int x, int y) { ... }
>         public pattern (int x, int y) Point(Point target) { ... }
>         ...
>     }
>
>     class Optional<T> {
>         // Static factory and pattern
>         public static<T> Optional<T> of(T t) { ... }
>         public static<T> pattern (T t) of(Optional<T> target) { ... }
>         ...
>     }
>     ```
>
>     The "inverse" approach might look like:
>
>     ```
>     class Point {
>         // Constructor and deconstructor
>         public Point(int x, int y) { ... }
>         public pattern Point(int x, int y) { ... }
>         ...
>     }
>
>     class Optional<T> {
>         // Static factory and pattern (using the first sub-option)
>         public static<T> Optional<T> of(T t) { ... }
>         public static<T> pattern Optional<T> of(T t) { ... }
>         ...
>     }
>     ```
>
>     With the "method-like" approach, the match candidate gets an
>     explicit name
>     selected by the author; with the inverse approach, we can go with
>     a predefined
>     name such as `that`.  (Because deconstructors do not have
>     receivers, we could by
>     abuse of notation arrange for the keyword `this` to refer instead
>     to the match
>     candidate within the body of a deconstructor.  While this might
>     seem to lead to
>     a more familiar notation for writing deconstructors, it would create a
>     gratuitous asymmetry between the bodies of deconstruction patterns
>     and those of
>     other patterns.)
>
>     Between these choices, nearly all the considerations favor the
>     "inverse"
>     approach:
>
>      - The "inverse" approach makes the declaration look like the use
>     site.  This
>        highlights that `pattern Point(int x, int y)` is what gets
>     invoked when you
>        match against the pattern use `Point(int x, int y)`. (This
>     point is so
>        strong that we should probably just stop here.)
>      - The "inverse" members also look like their duals; the only
>     difference is the
>        `pattern` keyword (and possibly the placement of the match
>     candidate type).
>        This makes matched pairs much more obvious, and such matched
>     pairs will be
>        critical both for future language features and for library idioms.
>      - The method-like approach is suggestive of multiple return or
>     tuples, which is
>        probably helpful for the first few minutes but actually harmful
>     in the long
>        term. This feature is _not_ (much as some people would like to
>     believe) about
>        multiple return or tuples, and playing into this misperception
>     will only make
>        it harder to truly understand.  So this suggestion ends up
>     propping up the
>        wrong mental model.
>
>     The main downside of the "inverse" approach is the one-time speed
>     bump of the
>     unfamiliarity of the inverted syntax.  (The "method-like" syntax
>     also has its
>     own speed bumps, it is just unfamiliar in different ways.)  But
>     unlike the
>     advantages of the inverse approach, which continue to add value
>     forever, this
>     speed bump is a one-time hurdle to get over.
>
>     To smooth out the speed bumps of the inverse approach, we can
>     consider moving
>     the position of the match candidate for static and (suitable)
>     instance pattern
>     declarations, such as:
>
>     ```
>     class Optional<T> {
>         // the usual static factory
>         public static<T> Optional<T> of(T t) { ... }
>
>         // Various ways of writing the corresponding pattern
>         public static<T> pattern of(T t) for Optional<T> { ... }
>         // or ...
>         public static<T> pattern(Optional<T>) of(T t) { ... }
>         // or ...
>         public static<T> pattern(Optional<T> that) of(T t) { ... }
>         // or ...
>         public static<T> pattern<Optional<T>> of(T t) { ... }
>         ...
>     }
>     ```
>
>     (The deconstructor example looks the same with either variant.) 
>     Of these,
>     treating the match candidate like a "parameter" of "pattern" is
>     probably the
>     most evocative:
>
>     ```
>     public static<T> pattern(Optional<T> that) of(T t) { ... }
>     ```
>
>     as it can be read as "pattern taking the parameter `Optional<T>
>     that` called
>     `of`, binding `T`, and is a short departure from the inverse syntax.
>
>     The main value of the various rearrangements is that users don't
>     need to think
>     about things operating in reverse to parse the syntax. This trades
>     some of the
>     secondary point (patterns looking almost exactly like their
>     inverses) for a
>     certain amount of cognitive load, while maintaining the most important
>     consideration: that the declaration site look like the use site.
>
>     For instance pattern declarations, if the match candidate type is
>     the same as
>     the receiver type, the match candidate type can be elided as it is
>     with
>     deconstructors.
>
>     My personal position: the "multiple return" version is terrible;
>     all the
>     sub-variants of the inverse version are probably workable.
>
>     ### Naming the match candidate
>
>     We've been assuming so far that the match candidate always has a
>     fixed name,
>     such as `that`; this is an entirely workable approach. Some of the
>     variants are
>     also amenable to allowing authors to explicitly select a name for
>     the match
>     candidate.  For example, if we put the match candidate as a
>     "parameter" to the `pattern` keyword, there is an obvious place to
>     put the name:
>
>     ```
>     static<T> pattern(Optional<T> target) of(T t) { ... }
>     ```
>
>     My personal opinion: I don't think this degree of freedom buys us
>     much, and in
>     the long run readability probably benefits by picking a fixed name
>     like `that`
>     and sticking with it.  Even with a fixed name, if there is a
>     sensible position
>     for the name, allowing users to type `that` for explicitness is
>     fine (as we do
>     with instance methods, though many people don't know this.)  We
>     may even want to
>     require it.
>
>     ## Body types
>
>     Just as there are two obvious approaches for the declaration,
>     there are two
>     obvious approaches we could take for the body (though there is
>     some coupling
>     between them.)  We'll call the two body approaches _imperative_ and
>     _functional_.
>
>     The imperative approach treats bindings as initially-DU variables
>     that must be
>     DA on successful completion, getting their value through ordinary
>     assignment;
>     the functional approach sets all the bindings at once,
>     positionally.  Either
>     way, member patterns (except maybe deconstructors) also need a way to
>     differentiate a successful match from a failed match.
>
>     Here is the `Point` deconstructor with both imperative and
>     functional style. The
>     functional style uses a placeholder `match` statement to indicate
>     a successful
>     match and provision of bindings:
>
>     ```
>     class Point {
>         int x, y;
>
>         Point(int x, int y) {
>             this.x = x;
>             this.y = y;
>         }
>
>         // Imperative style, deconstructor always succeeds
>         pattern Point(int x, int y) {
>             x = that.x;
>             y = that.y;
>         }
>
>         // Functional style
>         pattern Point(int x, int y) {
>             match(that.x, that.y);
>         }
>     }
>     ```
>
>     There are some obvious differences here.  In the imperative style,
>     the dtor body
>     looks much more like the reverse of the ctor body. The functional
>     style is more
>     concise (and amenable to further concision via the "concise method
>     bodies"
>     mechanism in the future), as well as a number of less obvious
>     differences.  For
>     deconstructors, the imperative approach is likely to feel more
>     natural because
>     of the obvious symmetry with constructors.
>
>     In reality, it is _premature at this point to have an opinion_,
>     because we
>     haven't yet seen the full scope of the problem; deconstructors are
>     a special
>     case in many ways, which almost surely is distorting our initial
>     opinion.  As we
>     move towards conditional patterns (and pattern lambdas), our
>     opinions may flip.
>
>     Regardless of which we pick, there are some additional syntactic
>     choices to be
>     made -- what syntax to use to indicate success (we used `match` in
>     the above
>     example) or failure.  (We should be especially careful around
>     trying to reuse
>     words like `return`, `break`, or `yield` because, in the case
>     where there are
>     zero bindings (which is allowable), it becomes unclear whether
>     they mean "fail"
>     or "succeed with zero bindings".)
>
>     ### Success and failure
>
>     Except for possibly deconstructors, which we may require to be
>     total, a pattern
>     declaration needs a way to indicate success and failure. In the
>     examples above,
>     we posited a `match` statement to indicate success in the
>     functional approach,
>     and in both examples leaned on the "implicit success" of
>     deconstructors (under
>     the assumption they always succeed).  Now let's look at the more
>     general case to
>     figure out what else is needed.
>
>     For a static pattern like `Optional::of`, success is conditional. 
>     Using
>     `match-fail` as a placeholder for "the match failed", this might
>     look like
>     (functional version):
>
>     ```
>     public static<T> pattern(Optional<T> that) of(T t) {
>         if (that.isPresent())
>             match (that.get());
>         else
>             match-fail;
>     }
>     ```
>
>     The imperative version is less pretty, though.  Using
>     `match-success` as a
>     placeholder:
>
>     ```
>     public static<T> pattern(Optional<T> that) of(T t) {
>         if (that.isPresent()) {
>             t = that.get();
>             match-success;
>         }
>         else
>             match-fail;
>     }
>     ```
>
>     Both arms of the `if` feel excessively ceremonial here. And if we
>     chose to not
>     make all deconstruction patterns unconditional, deconstructors
>     would likely need
>     some explicit success as well:
>
>     ```
>     pattern Point(int x, int y) {
>         x = that.x;
>         y = that.y;
>         match-success;
>     }
>     ```
>
>     It might be tempting to try and eliminate the need for explicit
>     success by
>     inferring it from whether or not the bindings are DA or not, but
>     this is
>     error-prone, is less type-checkable, and falls apart completely
>     for patterns
>     with no bindings.
>
>     ### Implicit failure in the functional approach
>
>     One of the ceremonial-seeming aspects of `Optional::of` above is
>     having to say
>     `else match-fail`, which doesn't feel like it adds a lot of
>     value.  Perhaps we
>     can be more concise without losing clarity.
>
>     Most conditional patterns will have a predicate to determine
>     matching, and then
>     some conditional code to compute the bindings and claim success. 
>     Having to say
>     "and if the predicate didn't hold, then I fail" seems like
>     ceremony for the
>     author and noise for the reader.  Instead, if a conditional
>     pattern falls off
>     the end without matching, we could treat that as simply not matching:
>
>     ```
>     public static<T> pattern(Optional<T> that) of(T t) {
>         if (that.isPresent())
>             match (that.get());
>     }
>     ```
>
>     This says what we mean: if the optional is present, then this
>     pattern succeeds
>     and bind the contents of the `Optional`.  As long as our "succeed"
>     construct
>     strongly enough connotes that we are terminating abruptly and
>     successfully, this
>     code is perfectly clear.  And most conditional patterns will look
>     a lot like
>     `Optional::of`; do some sort of test and if it succeeds, extract
>     the state and
>     bind it.
>
>     At first glance, this "implicit fail" idiom may seem error-prone
>     or sloppy.  But
>     after writing a few dozen patterns, one quickly tires of saying "else
>     match-fail" -- and the reader doesn't necessarily appreciate
>     reading it either.
>
>     Implicit failure also simplifies the selection of how we
>     explicitly indicate
>     failure; using `return` in a pattern for "no match" becomes pretty
>     much a forced
>     move.  We observe that (in a void method), "return" and "falling
>     off the end"
>     are equivalent; if "falling off the end" means "no match", then so
>     should an
>     explicit `return`.  So in those few cases where we need to
>     explicitly signal "no
>     match", we can just use `return`.  It won't come up that often,
>     but here's an
>     example where it does:
>
>     ```
>     static pattern(int that) powerOfTwo(int exp) {
>         int exp = 0;
>
>         if (that < 1)
>             return; // explicit fail
>
>         while (that > 1) {
>             if (that % 2 == 0) {
>                 that /= 2;
>                 ++exp;
>             }
>             else
>                 return; // explicit fail
>         }
>         match (exp);
>     }
>     ```
>
>     As a bonus, if `return` as match failure is a forced move, we need
>     only select a
>     term for "successful match" (which obviously can't be `return`). 
>     We could use
>     `match` as we have in the examples, or a variant like `matched` or
>     `matches`.
>     But rather than just creating a new control operator, we have an
>     opportunity to
>     lean into the duality a little harder, by including the pattern
>     syntax in the
>     match:
>
>     ```
>     matches of(that.get());
>     ```
>
>     or the (optionally?) qualified (inferring type arguments, as we do
>     at the use
>     site):
>
>     ```
>     matches Optional.of(that.get());
>     ```
>
>     These "use the name" approaches trades a small amount of verbosity
>     to gain a
>     higher degree of fidelity to the pattern use site (and to evoke
>     the comethod
>     completion.)
>
>     If we don't choose "implicit fail", we would have to invent _two_
>     new control
>     flow statements to indicate "success" and "failure".
>
>     My personal position: for the functional approach, implicit
>     failure both makes
>     the code simpler and clearer, and after you get used to it, you
>     don't want to go
>     back.  Whether we say `match` or `matches` or `matches
>     <pattern-name>` are all
>     workable, though I like some variant that names the pattern.
>
>     ### Implicit success in the imperative approach
>
>     In the imperative approach, we can be implicit as well, but it
>     feels more
>     natural (at least, initially) to choose implicit success rather
>     than failure.
>     This works great for unconditional patterns:
>
>     ```
>     pattern Point(int x, int y) {
>         x = that.x;
>         y = that.y;
>         // implicit success
>     }
>     ```
>
>     but not quite as well for conditional patterns:
>
>     ```
>     static<T> pattern(Optional<T> that) of(T t) {
>         if (that.isPresent()) {
>             t = that.get();
>         }
>         else
>             match-fail;
>         // implicit success
>     }
>     ```
>
>     We can eliminate one of the arms of the if, with the more concise (but
>     convoluted) inversion:
>
>     ```
>     static<T> pattern(Optional<T> that) of(T t) {
>         if (!that.isPresent())
>             match-fail;
>         t = that.get();
>         // implicit success
>     }
>     ```
>
>     Just as with the functional approach, if we choose imperative and
>     "implicit
>     success", using `return` to indicate success is pretty much a
>     forced move.
>
>     ### Imperative is a trap
>
>     If we assume that functional implies implicit failure, and
>     imperative implies
>     implicit success, then our choices become:
>
>     ```
>     class Optional<T> {
>         public static<T> Optional<T> of(T t) { ... }
>
>         // imperative, implicit success
>         public static<T> pattern(Optional<T> that) of(T t) {
>             if (that.isPresent()) {
>                 t = that.get();
>             }
>             else
>                 match-fail;
>         }
>
>         // functional, implicit failure
>         public static<T> pattern(Optional<T> that) of(T t) {
>             if (that.isPresent())
>                 matches of(that.get());
>         }
>     }
>     ```
>
>     Once we get past deconstructors, the imperative approach looks
>     worse by
>     comparison because we need to assign all the bindings (which is _O(n)_
>     assignments) _and also_ indicate success or failure somehow,
>     whereas in the
>     functional style all can be done together with a single `matches`
>     statement.
>
>     Looking at the alternatives, except maybe for unconditional
>     patterns, the
>     functional example above seems a lot more natural.  The imperative
>     approach
>     works with deconstructors (assuming they are not conditional), but
>     does not
>     scale so well to conditionality -- which is the essence of patterns.
>
>     From a theoretical perspective, the method-comethod duality also
>     gives us a
>     forceful nudge towards the functional approach.  In a method, the
>     method
>     arguments are specified as a positional list of expressions at the
>     use site:
>
>         m(a, b, c)
>
>     and these values are invisibly copied into the parameter slots of
>     the method
>     prior to frame activation.  The dual to that for a comethod to
>     similarly convey
>     the bindings in a positional list of expressions (as they must
>     either all be
>     produced or none), where they are copied into the slots provided
>     at the use
>     site, as is indicated by `matches` in the above examples.
>
>     My personal position: the imperative style feels like a trap.  It
>     seems
>     "obvious" at first if we start with deconstructors, but becomes
>     increasingly
>     difficult when we get past this case, and gets in the way of other
>     opportunities.  The last gasp before acceptance is the discomfort
>     that dtor and
>     ctor bodies are written in different styles, but in the rear-view
>     mirror, this
>     feels like a non-issue.
>
>     ### Derive imperative from functional?
>
>     If we start with "functional with implicit failure", we can
>     possibly rescue
>     imperative by deriving a version of imperative from functional, by
>     "overloading"
>     the match-success operator.
>
>     If we have a pattern whose binding names are `b1..bn` of types
>     `B1..Bn`, then
>     the `matches` operator must take a list of expressions `e1..en`
>     whose arity and
>     types are compatible with `B1..Bn`.  But we could allow `matches`
>     to also have a
>     nilary form, which would have the effect of being shorthand for
>
>         matches <pattern-name>(b1, b2, ..., bn)
>
>     where each of `b1..bn` must be DA at the point of matching.  This
>     means that we
>     could express patterns in either form:
>
>     ```
>     class Optional<T> {
>         public static<T> Optional<T> of(T t) { ... }
>
>         // imperative, derived from functional with implicit failure
>         public static<T> pattern(Optional<T> that) of(T t) {
>             if (that.isPresent()) {
>                 t = that.get();
>                 matches of;
>             }
>         }
>
>         public static<T> pattern(Optional<T> that) of(T t) {
>             if (that.isPresent())
>                 matches of(that.get());
>         }
>     }
>     ```
>
>     This flexibility allows users to select a more verbose expression
>     in exchange
>     for a clearer association of expressions and bindings, though as
>     we'll see, it
>     does come with some additional constraints.
>
>     ### Wrapping an existing API
>
>     Nearly every library has methods (sometimes sets of methods) that
>     are patterns
>     in disguise, such as the pair of methods `isArray` and
>     `getComponentType` in
>     `Class`, or the `Matcher` helper type in `java.util.regex`. 
>     Library maintainers
>     will likely want to wrap (or replace) these with real patterns, so
>     these can
>     participate more effectively in conditional contexts, and in some
>     cases,
>     highlight their duality with factory methods.
>
>     Matching a string against a `j.u.r.Pattern` regular expression has
>     all the same
>     elements as a pattern, just with an ad-hoc API (and one that I
>     have to look up
>     every time).  But we can fairly easily wrap a true pattern around
>     the existing
>     API.  To match against a `Pattern` today, we pass the match
>     candidate to
>     `Pattern::matcher`, which returns a `Matcher` with accessors
>     `Matcher::matches`
>     (did it match) and `Matcher::group` (conditionally extract a
>     particular capture
>     group.)  If we want to wrap this with a pattern called `regexMatch`:
>
>     ```
>     pattern(String that) regexMatch(String... groups) {
>         Matcher m = this.matcher(that);
>         if (m.matches())
>             matches Pattern.regexMatch(IntStream.range(1, m.groupCount())
>     .map(Matcher::group)
>     .toArray(String[]::new));
>         // whole lotta matchin' goin' on
>     }
>     ```
>
>     This says that a `j.u.r.Pattern` has an instance pattern called
>     `regex`, whose
>     match candidate is `String`, and which binds a varargs of `String`
>     corresponding
>     to the capture groups.  The implementation simply delegates to the
>     existing
>     `j.u.r.Matcher` API.  This means that `j.u.r.Pattern` becomes a
>     sort of "pattern
>     object", and we can use it as a receiver at the use site:
>
>     ```
>     static Pattern As = Pattern.compile("(a*)");
>     static Pattern Bs = Pattern.compile("(b*)");
>     ...
>     switch (string) {
>         case As.regexMatch(var as): ...
>         case Bs.regexMatch(var bs): ...
>         ...
>     }
>     ```
>
>     ### Odds and ends
>
>     There are a number of loose ends here.  We could choose other
>     names for the
>     match-success and match-fail operations, including trying to reuse
>     `break` or
>     `yield`.  But, this reuse is tricky; it must be very clear whether
>     a given form
>     of abrupt completion means "success" or "failure", because in the
>     case of
>     patterns with no bindings, we will have no other syntactic cues to
>     help
>     disambiguate.  (I think having a single `matches`, with implicit
>     failure and
>     `return` meaning failure, is the sweet spot here.)
>
>     Another question is whether the binding list introduces
>     corresponding variables
>     into the scope of the body.  For imperative, the answer is "surely
>     yes"; for
>     functional, the answer is "maybe" (unless we want to do the trick
>     where we
>     derive imperative from functional, in which case the answer is
>     "yes" again.)
>
>     If the binding list does not correspond to variables in the body,
>     this may be
>     initially discomforting; because they do not declare program
>     elements, they may
>     feel that they are left "dangling".  But even if they are not
>     declaring
>     _program_ elements, they are still declaring _API_ elements
>     (similar to the
>     return type of a method.)  We will want to provide Javadoc on the
>     bindings, just
>     like with parameters; we will want to match up binding names in
>     deconstructors
>     with parameter names in constructors; we may even someday want to
>     support
>     by-name binding at the use site (e.g., `case Foo(a: var a)`).  The
>     names are
>     needed for all of these, just not for the body. Names still
>     matter.  My take
>     here is that this is a transient "different is scary" reaction,
>     one that we
>     would get over quickly.
>
>     A final question is whether we should consider unqualified names
>     as implicitly
>     qualified by `that` (and also `this`, for instance patterns, with
>     some conflict
>     resolution).  Users will probably grow tired of typing `that.` all
>     the time, and most of the time, the unqualified use is perfectly
>     readable.
>
>     ## Exhaustiveness
>
>     There is one last syntax question in front of us: how to indicate
>     that a set of
>     patterns are (claimed to be) exhaustive on a given match candidate
>     type.  We see
>     this with `Optional::of` and `Optional::empty`; it would be sad if
>     the compiler
>     did not realize that these two patterns together were exhaustive
>     on `Optional`.
>     This is not a feature that will be used often, but not having it
>     at all will be
>     a repeated irritant.
>
>     The best I've come up with is to call these `case` patterns, where
>     a set of
>     `case` patterns for a given match candidate type in a given class
>     are asserted
>     to be an exhaustive set:
>
>     ```
>     class Optional<T> {
>         static<T> Optional<T> of(T t) { ... }
>         static<T> Optional<T> empty() { ... }
>
>         static<T> case pattern of(T t) for Optional<T> { ... }
>         static<T> case pattern empty() for Optional<T> { ... }
>     }
>     ```
>
>     Because they may not be truly exhaustive, `switch` constructs will
>     have to back
>     up the static assumption of exhaustiveness with a dynamic check,
>     as we do for
>     other sets of exhaustive patterns that may have remainder.
>
>     I've experimented with variants of `sealed` but it felt more
>     forced, so this is
>     the best I've come up with.
>
>     ## Example: patterns delegating to other patterns
>
>     Pattern implementations must compose.  Just as a subclass
>     constructor delegates
>     to a superclass constructor, the same should be true for
>     deconstructors.
>     Here's a typical superclass-subclass pair:
>
>     ```
>     class A {
>         private final int a;
>
>         public A(int a) { this.a = a; }
>         public pattern A(int a) { matches A(that.a); }
>     }
>
>     class B extends A {
>         private final int b;
>
>         public B(int a, int b) {
>             super(a);
>             this.b = b;
>         }
>
>         // Imperative style
>         public pattern B(int a, int b) {
>             if (that instanceof super(var aa)) {
>                 a = aa;
>                 b = that.b;
>                 matches B;
>             }
>         }
>
>         // Functional style
>         public pattern B(int a, int b) {
>             if (that instanceof super(var a))
>                 matches B(a, b);
>         }
>     }
>     ```
>
>     (Ignore the flow analysis and totality for the time being; we'll
>     come back to
>     this in a separate document.)
>
>     The first thing that jumps out at us is that, in the imperative
>     version, we had
>     to create a "garbage" variable `aa` to receive the binding,
>     because `a` was
>     already in scope, and then we have to copy the garbage variable
>     into the real
>     binding variable. Users will surely balk at this, and rightly so. 
>     In the
>     functional version (depending on the choices from "Odds and Ends")
>     we are free
>     to use the more natural name and avoid the roundabout locution.
>
>     We might be tempted to fix the "garbage variable" problem by
>     inventing another
>     sub-feature: the ability to use an existing variable as the target
>     of a binding,
>     such as:
>
>     ```
>     pattern Point(int a, int b) {
>         if (this instanceof A(__bind a))
>             b = this.b;
>     }
>     ```
>
>     But, I think the language is stronger without this feature, for
>     two reasons.
>     First, having to reason about whether a pattern match introduces a
>     new binding
>     or assigns to an existing variables is additional cognitive load
>     for users to
>     reason about, and second, having assignment to locals happening
>     through
>     something other than assignment introduces additional complexity
>     in finding
>     where a variable is modified.  While we can argue about the
>     general utility of
>     this feature, bringing it in just to solve the garbage-variable
>     problem is
>     particularly unattractive.
>
>     ## Pattern lambdas
>
>     One final consideration is is that patterns may also have a lambda
>     form.  Given
>     a single-abstract-pattern (SAP) interface:
>
>     ```
>     interface Converter<T,U> {
>         pattern(T t) convert(U u);
>     }
>     ```
>
>     one can implement such a pattern with a lambda. Such a lambda has
>     one parameter
>     (the match candidate), and its body looks like the body of a
>     declared pattern:
>
>     ```
>     Converter<Integer, Short> c =
>         i -> {
>             if (i >= Short.MIN_VALUE && i <= Short.MAX_VALUE)
>                 matches Converter.convert((short) i);
>         };
>     ```
>
>     Because the bindings of the pattern lambda are defined in the
>     interface, not in
>     the lambda, this is one more reason not to like the imperative
>     version: it is
>     brittle, and alpha-renaming bindings in the interface would be a
>     source-incompatible change.
>
>     ## Example gallery
>
>     Here's all the pattern examples so far, and a few more, using the
>     suggested
>     style (functional, implicit fail, implicit `that`-qualification):
>
>     ```
>     // Point dtor
>     pattern Point(int x, int y) {
>         matches Point(x, y);
>     }
>
>     // Optional -- static patterns for Optional::of, Optional::empty
>     static<T> case pattern(Optional<T> that) of(T t) {
>         if (isPresent())
>             matches of(t);
>     }
>
>     static<T> case pattern(Optional<T> that) empty() {
>         if (!isPresent())
>             matches empty();
>     }
>
>     // Class -- instance pattern for arrayClass (match candidate type
>     inferred)
>     pattern arrayClass(Class<?> componentType) {
>         if (that.isArray())
>             matches arrayClass(that.getComponentType());
>     }
>
>     // regular expression -- instance pattern in j.u.r.Pattern
>     pattern(String that) regexMatch(String... groups) {
>         Matcher m = matcher(that);
>         if (m.matches())
>             matches Pattern.regexMatch(IntStream.range(1, m.groupCount())
>     .map(Matcher::group)
>     .toArray(String[]::new));
>     }
>
>     // power of two (somewhere)
>     static pattern(int that) powerOfTwo(int exp) {
>         int exp = 0;
>
>         if (that < 1)
>             return;
>
>         while (that > 1) {
>             if (that % 2 == 0) {
>                 that /= 2;
>                 exp++;
>             }
>             else
>                 return;
>         }
>         matches powerOfTwo(exp);
>     }
>     ```
>
>     ## Closing thoughts
>
>     I came out of this exploration with very different conclusions
>     than I expected
>     when going in.  At first, the "inverse" syntax seemed stilted, but
>     over time it
>     started to seem more obvious.  Similarly, I went in expecting to
>     prefer the
>     imperative approach for the body, but over time, started to warm
>     to the
>     functional approach, and eventually concluded it was basically a
>     forced move if
>     we want to support more than just deconstructors.  And I started
>     out skeptical
>     of "implicit fail", but after writing a few dozen patterns with
>     it, going back
>     to fully explicit felt painful.  All of this is to say, you should
>     hold your
>     initial opinions at arm's length, and give the alternatives a
>     chance to sink in.
>
>     For most _conditional_ patterns (and conditionality is at the
>     heart of pattern
>     matching), the functional approach cleanly highlights both the
>     match predicate
>     and the flow of values, and is considerably less fussy than the
>     imperative
>     approach in the same situation; `Optional::of`,
>     `Class::arrayClass`, and `regex`
>     look great here, much better than the would with imperative.  None
>     of these
>     illustrate delegation, but in the presence of delegation, the gap
>     gets even
>     wider.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20240330/251383c3/attachment-0001.htm>


More information about the amber-spec-observers mailing list