Member Patterns -- the bikeshed

Clement Cherlin ccherlin at gmail.com
Tue Apr 2 22:16:19 UTC 2024


# Pattern input (candidate) / output (binding) syntax

Java already has a familiar syntax for separating a variable binding from
an input value: It's the enhanced for loop header.

for (<binding> : <candidate>)

becomes

pattern <name> (<binding(s)> : <candidate>)

Examples:

## Optional
public static <T> pattern of(T t : Optional<T> that)

case Optional.of(T t)

## Map.Entry
public static<K,V> pattern of(K key, V value : Map.Entry<K, V> that)

case Map.Entry.of(String key, Integer value)

# Covering pattern groups

Java's already got labels, it could also have "covering pattern group
labels".

## Integer
case PARITY pattern even(int even : Integer that)
case PARITY pattern odd(int odd : Integer that)

case (SIGNUM, FACTORING) pattern negative(int n : Integer that)
case (SIGNUM, FACTORING) pattern zero(int z : Integer that)
case SIGNUM pattern positive(int p : Integer that)

case FACTORING pattern unit(int u : Integer that)
case FACTORING pattern prime(int p : Integer that)
case FACTORING pattern composite(int c : Integer that)

The case sets {even, odd}, {positive, zero, negative} and {negative, zero,
unit, prime, composite} are all total, but the case set {even, negative,
composite} is not.

Cheers,
Clement Cherlin

On Sat, Mar 30, 2024 at 3:53 PM Brian Goetz <brian.goetz at oracle.com> wrote:

>
>
> On 3/30/2024 3:23 PM, Victor Nazarov wrote:
>
> I have two points that I think may be good to consider in the list of
> options.
>
> 1. I'm not sure if this was considered, but I find explicit lists of
> covering patterns
> rather natural and more flexible than using case as a pattern-modifier.
>
>
> Agreed (this is how F# does it), and we tried that, but it is so contrary
> to how members are done in Java.   (One might think that one could declare
> a "sealed" pattern, which "permits" a list of other patterns, and this
> sounds perfectly natural, but it looks pretty weird.)
>
> The important feature of explicit lists is that there may be more than one
> covering set of patterns.
>
>
> Yes, been down this road too, but the reality is that this is not likely
> to come up nearly as often as one might imagine.
>
> 2. I think that there is a middle ground between functional and imperative
> pattern body definition style that may look cumbersome at first, but
> nevertheless gives you best of both worlds:
>
>
> The `match` block is an interesting idea, will consider.
>
>
>
>     * deconstructor patterns look dual to constructors
>     * names from the list of pattern variables are actually used and
> checked by the compiler
>     * control flow is still functional, which is more natural
>
> The downside that is retained from the imperative style is the need for
> alpha-renaming,
> but I think we still have to deal with shadowing and renaming
> local-variable seems natural and easy.
>
> Middle ground may be used like a special form that can be used in the
> pattern body.
> This form works mostly the same way as `with`-clause as defined in the
> "Derived Record Instances" JEP.
>
> Here is the long list of examples to fully illustrate different
> interactions:
>
> ````
>     class Optional<T> matches (of|empty) {
>         public static <T> pattern<Optional<T>> of(T value) {
>             if (that.isPresent()) {
>                 match {
>                     value = that.get();
>                 }
>             }
>         }
>
>         public static <T> pattern<Optional<T>> empty() {
>             if (that.isEmpty())
>                 match {}
>         }
>     }
>
>     class Pattern {
>         public pattern<String> regexMatch(String... groups) {
>             Matcher m = this.matcher(that);
>             if (m.matches()) {
>                 match {
>                     groups =
>                             IntStream.range(1, m.groupCount())
>                                     .map(Matcher::group)
>                                     .toArray(String[]::new);
>                 }
>             }
>         }
>     }
>
>     class A {
>         private final int a;
>
>         public A(int a) {
>             this.a = a;
>         }
>         public pattern A(int a) {
>             match {
>                 a = that.a;
>             }
>         }
>     }
>
>     class B extends A {
>         private final int b;
>
>         public B(int a, int b) {
>             super(a);
>             this.b = b;
>         }
>
>         public pattern B(int a, int b) {
>             if (that instanceof super(var aa)) {
>                 match {
>                     a = aa;
>                     b = that.b;
>                 }
>             }
>         }
>     }
>
>     interface Converter<T,U> {
>         pattern<T> convert(U u);
>     }
>     Converter<Integer, Short> c =
>         pattern (s) -> {
>             if (that >= Short.MIN_VALUE && that <= Short.MAX_VALUE)
>                 match {
>                     s = (short) that;
>                 }
>         };
> ````
>
> --
> Victor Nazarov
>
>
> On Fri, Mar 29, 2024 at 10:59 PM Brian Goetz <brian.goetz at oracle.com>
> wrote:
>
>> We now come to the long-awaited bikeshed discussion on what member
>> patterns should look like.
>>
>> Bikeshed disclaimer for EG:
>>   - This is likely to evoke strong opinions, so please take pains to be
>> especially constructive
>>   - Long reply-to-reply threads should be avoided even more than usual
>>   - Holistic, considered replies preferred
>>   - Please change subject line if commenting on a sub-topic or tangential
>>     concern
>>
>> Special reminders for Remi:
>>  - Use of words like "should", "must", "shouldn't", "mistake", "wrong",
>> "broken"
>>    are strictly forbidden.
>>  - If in doubt, ask questions first.
>>
>> Notes for external observers:
>>  - This is a working document for the EG; the discussion may continue for
>> a
>>    while before there is an official proposal.  Please be patient.
>>
>>
>> # Pattern declaration: the bikeshed
>>
>> We've largely identified the model for what kinds of patterns we need to
>> express, but there are still several degrees of freedom in the syntax.
>>
>> As the model has simplified during the design process, the space of syntax
>> choices has been pruned back, which is a good thing.  However, there are
>> still
>> quite a few smaller decisions to be made.  Not all of the considerations
>> are
>> orthogonal, so while they are presented individually, this is not a "pick
>> one
>> from each column" menu.
>>
>> Some of these simplifications include:
>>
>>  - Patterns with "input arguments" have been removed; another way to get
>> to what
>>    this gave us may come back in another form.
>>  - I have grown increasingly skeptical of the value of the imperative
>> `match`
>>    statement.  With better totality analysis, I think it can be
>> eliminated.
>>
>> We can discuss these separately but I would like to sync first on the
>> broad
>> strokes for how patterns are expressed.
>>
>> ## Object model requirements
>>
>> As outlined in "Towards Member Patterns", the basic model is that
>> patterns are
>> the dual of other executable members (constructors, static methods,
>> instance
>> methods.)  While they are like methods in that they have inputs, outputs,
>> names,
>> and an imperative body, they have additional degrees of freedom that
>> constructors and methods lack:
>>
>>  - Patterns are, in general, _conditional_ (they can succeed or fail),
>> and only
>>    produce bindings (outputs) when they succeed.  This conditionality is
>>    understood by the language's flow analysis, and is used for computing
>> scoping
>>    and definite assignment.
>>  - Methods can return at most one value; when a pattern completes
>> successfully,
>>    it may bind multiple values.
>>  - All patterns have a _match candidate_, which is a distinguished,
>>    possibly-implicit parameter.  Some patterns also have a receiver,
>> which is
>>    also a distinguished, possibly-implicit parameter.  In some such cases
>> the
>>    receiver and match candidate are aliased, but in others these may
>> refer to
>>    different objects.
>>
>> So a pattern is a named executable member that takes a _match candidate_
>> as a
>> possibly-implicit parameter, maybe takes a receiver as an implicit
>> parameter,
>> and has zero or more conditional _bindings_.  Its body can perform
>> imperative
>> computation, and can terminate either with match failure or success.  In
>> the
>> success case, it must provide a value for each binding.
>>
>> Deconstruction patterns are special in many of the same ways constructors
>> are:
>> they are constrained in their name, inheritance, and probably their
>> conditionality (they should probably always succeed).  Just as the syntax
>> for
>> constructors differs slightly from that of instance methods, the syntax
>> for
>> deconstructors may differ slightly from that of instance patterns.  Static
>> patterns, like static methods, have no receiver and do not have access to
>> the
>> type parameters of the enclosing class.
>>
>> Like constructors and methods, patterns can be overloaded, but in
>> accordance
>> with their duality to constructors and methods, the overloading happens
>> on the
>> _bindings_, not the inputs.
>>
>> ## Use-site syntax
>>
>> There are several kinds of type-driven patterns built into the language:
>> type
>> patterns and record patterns.  A type pattern in a `switch` looks like:
>>
>>     case String s: ...
>>
>> And a record pattern looks like:
>>
>>     case MyRecord(P1, P2, ...): ...
>>
>> where `P1..Pn` are nested patterns that are recursively matched to the
>> components of the record.  This use-site syntax for record patterns was
>> chosen
>> for its similarity to the construction syntax, to highlight that a record
>> pattern is the dual of record construction.
>>
>> **Deconstruction patterns.**  The simplest kind of member pattern, a
>> deconstruction pattern, will have the same use-site syntax as a record
>> pattern;
>> record patterns can be thought of as a deconstruction pattern "acquired
>> for
>> free" by records, just as records do with constructors, accessors, object
>> methods, etc.  So the use of a deconstruction pattern for `Point` looks
>> like:
>>
>>     case Point(var x, var y): ...
>>
>> whether `Point` is a record or an ordinary class equipped with a suitable
>> deconstruction pattern.
>>
>> **Static patterns.**  Continuing with the idea that the destructuring
>> syntax
>> should evoke the aggregation syntax, there is an obvious candidate for the
>> use-site syntax for static patterns:
>>
>>     case Optional.of(var e): ...
>>     case Optional.empty(): ...
>>
>> **Instance patterns.**  Uses of instance patterns will likely come in two
>> forms,
>> analogous to bound and unbound instance method references, depending on
>> whether
>> the receiver and the match candidate are the same object.  In the unbound
>> form,
>> used when the receiver is the same object as the match candidate, the
>> pattern
>> name is qualified by a _type_:
>>
>> ```
>> Class<?> k = ...
>> switch (k) {
>>     // Qualified by type
>>     case Class.arrayClass(var componentType): ...
>> }
>> ```
>>
>> This means that we _resolve_ the pattern `arrayClass` starting at `Class`
>> and
>> _select_ the pattern using the receiver, `k`.  We may also be able to
>> omit the
>> class qualifier if the static type of the match candidate is sufficient to
>> resolve the desired pattern.
>>
>> In the bound form, used when the receiver is distinct from the match
>> candidate,
>> the pattern name is qualified with an explicit _receiver expression_.  As
>> an
>> example, consider an interface that captures primitive widening and
>> narrowing
>> conversions, such as those between `int` and `long`.  In the widening
>> direction,
>> conversion is unconditional, so this can be modeled as a method from
>> `int` to
>> `long`.  In the other direction, conversion is conditional, so this is
>> better
>> modeled as a _pattern_ whose match candidate is `long` and which binds an
>> `int`
>> on success.  Since these are instance methods of some class (say,
>> `NumericConversion<T,U>`), we need to provide the receiver instance in
>> order to
>> resolve the pattern:
>>
>> ```
>> NumericConversion<int, long> nc = ...
>>
>> switch (aLong) {
>>     case nc.narrowed(int i):
>>     ...
>> }
>> ```
>>
>> The explicit receiver syntax would also be used if we exposed regular
>> expression
>> matching as a pattern on the `j.u.r.Pattern` object (the name collision on
>> `Pattern` is unfortunate).  Imagine we added a `matching` instance
>> pattern to
>> `j.u.r.Pattern`; then we could use it in `instanceof` as follows:
>>
>> ```
>> static final java.util.regex.Pattern P = Pattern.compile("(a*)(b*)");
>> ...
>> if (aString instanceof P.matching(String as, String bs)) { ... }
>> ```
>>
>> Each of these use-site syntaxes is modeled after the use-site syntax for a
>> method invocation or method reference.
>>
>> ## Declaration-site syntax
>>
>> To avoid being biased by the simpler cases, we're going to work all the
>> cases
>> concurrently rather than starting with the simpler cases and working up.
>> (It
>> might seem sensible to start with deconstructors, since they are the
>> "easy"
>> case, but if we did that, we would likely be biased by their simplicity
>> and then
>> find ourselves painted into a corner.)  As our example gallery, we will
>> consider:
>>
>>  - Deconstruction pattern for `Point`;
>>  - Static patterns for `Optional::of` and `Optional::empty`;
>>  - Static pattern for "power of two" (illustrating a computations where
>> success
>>    or failure, and computation of bindings, cannot easily be separated);
>>  - Instance pattern for `Class::arrayClass` (used unbound);
>>  - Instance pattern for `Pattern::matching` on regular expressions (used
>> bound).
>>
>> Member patterns, like methods, have _names_.  (We can think of
>> constructors as
>> being named for their enclosing classes, and the same for
>> deconstructors.)  All
>> member patterns have a (possibly empty) ordered list of _bindings_, which
>> are
>> the dual of constructor or method parameters.  Bindings, in turn, have
>> names and
>> types.  And like constructors and methods, member patterns have a _body_
>> which
>> is a block statement.  Member patterns also have a _match candidate_,
>> which is a
>> likely-implicit method parameter.
>>
>> ### Member patterns as inverse methods and constructors
>>
>> Regardless of syntax, let us remind ourselves that that deconstructors
>> are the
>> categorical dual to constructors (coconstructors), and pattern methods
>> are the
>> categorical dual to methods (comethods).  They are dual in their
>> structure: a
>> constructor or method takes N arguments and produces a result, the
>> corresponding
>> member pattern consumes a match candidate and (conditionally) produces N
>> bindings.
>>
>> Moreover, they are semantically dual: the return value produced by
>> construction
>> or factory invocation is the match candidate for the corresponding member
>> pattern, and the bindings produced by a member pattern are the answers to
>> the
>> _Pattern Question_ -- "could this object have come from an invocation of
>> my
>> dual, and if so, with what arguments."
>>
>> ### What do we call them?
>>
>> Given the significant overlap between methods and patterns, the first
>> question
>> about the declaration we need to settle is how to identify a member
>> pattern
>> declaration as distinct from a method or constructor declaration.
>> _Towards
>> Member Patterns_ tried out a syntax that recognized these as _inverse_
>> methods
>> and constructors:
>>
>>     public Point(int x, int y) { ... }
>>     public inverse Point(int x, int y) { ... }
>>
>> While this is a principled choice which clearly highlights the duality,
>> and one
>> that might be good for specification and verbal description, it is
>> questionable
>> whether this would be a great syntax for reading and writing programs.
>>
>> A more traditional option is to choose a "noun" (conditional) keyword,
>> such as
>> `pattern`, `matcher`, `extractor`, `view`, etc:
>>
>>     public pattern Point(int x, int y) { ... }
>>
>> If we are using a noun keyword to identify pattern declarations, we could
>> use
>> the same noun for all of them, or we could choose a different one for
>> deconstruction patterns:
>>
>>     public deconstructor Point(int x, int y) { ... }
>>
>> Alternately, we could reach for a symbol to indicate that we are talking
>> about
>> an inverted member.  C++ fans might suggest
>>
>>     public ~Point(int x, int y) { ... }
>>
>> but this is too cryptic (it's evocative once you see it, but then it
>> becomes
>> less evocative as we move away from deconstructors towards instance
>> patterns.)
>>
>> If we wish to offer finer-grained control over conditionality, we might
>> additionally need a `total` / `partial` modifier, though I would prefer
>> to avoid
>> that.
>>
>> Of the keyword candidates, there is one that stands out (for good and bad)
>> because it connects to something that is already in the language:
>> `pattern`.  On
>> the one hand, using the term `pattern` for the declaration is a slight
>> abuse; on
>> the other, users will immediately connect it with "ah, so that's how I
>> make a
>> new pattern" or "so that's what happens when I match against this
>> pattern."
>> (Lisps would resolve this tension by calling it `defpattern`.)
>>
>> The others (`matcher`, `view`, `extractor`, etc) are all made-up terms
>> that
>> don't connect to anything else in the language, for better or worse.  If
>> we pick
>> one of these, we are asking users to sort out _three_ separate new things
>> in
>> their heads: (use-site) patterns, (declaration-site) matchers, and the
>> rules of
>> how patterns and matchers are connected.  Calling them both "patterns",
>> despite
>> the mild abuse of terminology, ties them together in a way that
>> recognizes their
>> connection.
>>
>> My personal position: `pattern` is the strongest candidate here, despite
>> some
>> flaws.
>>
>> ### Binding lists and match candidates
>>
>> There are two obvious alternatives for describing the binding list and
>> match
>> candidate of a pattern declaration, both with their roots in the
>> constructor and
>> method syntax:
>>
>>  - Pretend that a pattern declaration is like a method with multiple
>> return, and
>>    put the binding list in the "return position", and make the match
>> candidate
>>    an ordinary parameter;
>>  - Lean into the inverse relationship between constructors and methods
>> (and
>>    consistency with the use-site syntax), and put the binding list in the
>>    "parameter list position". For static patterns and some instance
>> patterns,
>>    which need to explicitly identify the match candidate type, there are
>> several
>>    sub-options:
>>    - Lean further into the duality, putting the match candidate type in
>> the
>>      "return position";
>>    - Put the match candidate type somewhere else, where it is less likely
>> to be
>>      confused for a method return.
>>
>> The "method-like" approach might look like this:
>>
>> ```
>> class Point {
>>     // Constructor and deconstructor
>>     public Point(int x, int y) { ... }
>>     public pattern (int x, int y) Point(Point target) { ... }
>>     ...
>> }
>>
>> class Optional<T> {
>>     // Static factory and pattern
>>     public static<T> Optional<T> of(T t) { ... }
>>     public static<T> pattern (T t) of(Optional<T> target) { ... }
>>     ...
>> }
>> ```
>>
>> The "inverse" approach might look like:
>>
>> ```
>> class Point {
>>     // Constructor and deconstructor
>>     public Point(int x, int y) { ... }
>>     public pattern Point(int x, int y) { ... }
>>     ...
>> }
>>
>> class Optional<T> {
>>     // Static factory and pattern (using the first sub-option)
>>     public static<T> Optional<T> of(T t) { ... }
>>     public static<T> pattern Optional<T> of(T t) { ... }
>>     ...
>> }
>> ```
>>
>> With the "method-like" approach, the match candidate gets an explicit name
>> selected by the author; with the inverse approach, we can go with a
>> predefined
>> name such as `that`.  (Because deconstructors do not have receivers, we
>> could by
>> abuse of notation arrange for the keyword `this` to refer instead to the
>> match
>> candidate within the body of a deconstructor.  While this might seem to
>> lead to
>> a more familiar notation for writing deconstructors, it would create a
>> gratuitous asymmetry between the bodies of deconstruction patterns and
>> those of
>> other patterns.)
>>
>> Between these choices, nearly all the considerations favor the "inverse"
>> approach:
>>
>>  - The "inverse" approach makes the declaration look like the use site.
>> This
>>    highlights that `pattern Point(int x, int y)` is what gets invoked
>> when you
>>    match against the pattern use `Point(int x, int y)`.  (This point is so
>>    strong that we should probably just stop here.)
>>  - The "inverse" members also look like their duals; the only difference
>> is the
>>    `pattern` keyword (and possibly the placement of the match candidate
>> type).
>>    This makes matched pairs much more obvious, and such matched pairs
>> will be
>>    critical both for future language features and for library idioms.
>>  - The method-like approach is suggestive of multiple return or tuples,
>> which is
>>    probably helpful for the first few minutes but actually harmful in the
>> long
>>    term. This feature is _not_ (much as some people would like to
>> believe) about
>>    multiple return or tuples, and playing into this misperception will
>> only make
>>    it harder to truly understand.  So this suggestion ends up propping up
>> the
>>    wrong mental model.
>>
>> The main downside of the "inverse" approach is the one-time speed bump of
>> the
>> unfamiliarity of the inverted syntax.  (The "method-like" syntax also has
>> its
>> own speed bumps, it is just unfamiliar in different ways.)  But unlike the
>> advantages of the inverse approach, which continue to add value forever,
>> this
>> speed bump is a one-time hurdle to get over.
>>
>> To smooth out the speed bumps of the inverse approach, we can consider
>> moving
>> the position of the match candidate for static and (suitable) instance
>> pattern
>> declarations, such as:
>>
>> ```
>> class Optional<T> {
>>     // the usual static factory
>>     public static<T> Optional<T> of(T t) { ... }
>>
>>     // Various ways of writing the corresponding pattern
>>     public static<T> pattern of(T t) for Optional<T> { ... }
>>     // or ...
>>     public static<T> pattern(Optional<T>) of(T t) { ... }
>>     // or ...
>>     public static<T> pattern(Optional<T> that) of(T t) { ... }
>>     // or ...
>>     public static<T> pattern<Optional<T>> of(T t) { ... }
>>     ...
>> }
>> ```
>>
>> (The deconstructor example looks the same with either variant.)  Of these,
>> treating the match candidate like a "parameter" of "pattern" is probably
>> the
>> most evocative:
>>
>> ```
>> public static<T> pattern(Optional<T> that) of(T t) { ... }
>> ```
>>
>> as it can be read as "pattern taking the parameter `Optional<T> that`
>> called
>> `of`, binding `T`, and is a short departure from the inverse syntax.
>>
>> The main value of the various rearrangements is that users don't need to
>> think
>> about things operating in reverse to parse the syntax.  This trades some
>> of the
>> secondary point (patterns looking almost exactly like their inverses) for
>> a
>> certain amount of cognitive load, while maintaining the most important
>> consideration: that the declaration site look like the use site.
>>
>> For instance pattern declarations, if the match candidate type is the
>> same as
>> the receiver type, the match candidate type can be elided as it is with
>> deconstructors.
>>
>> My personal position: the "multiple return" version is terrible; all the
>> sub-variants of the inverse version are probably workable.
>>
>> ### Naming the match candidate
>>
>> We've been assuming so far that the match candidate always has a fixed
>> name,
>> such as `that`; this is an entirely workable approach.  Some of the
>> variants are
>> also amenable to allowing authors to explicitly select a name for the
>> match
>> candidate.  For example, if we put the match candidate as a "parameter"
>> to the `pattern` keyword, there is an obvious place to put the name:
>>
>> ```
>> static<T> pattern(Optional<T> target) of(T t) { ... }
>> ```
>>
>> My personal opinion: I don't think this degree of freedom buys us much,
>> and in
>> the long run readability probably benefits by picking a fixed name like
>> `that`
>> and sticking with it.  Even with a fixed name, if there is a sensible
>> position
>> for the name, allowing users to type `that` for explicitness is fine (as
>> we do
>> with instance methods, though many people don't know this.)  We may even
>> want to
>> require it.
>>
>> ## Body types
>>
>> Just as there are two obvious approaches for the declaration, there are
>> two
>> obvious approaches we could take for the body (though there is some
>> coupling
>> between them.)  We'll call the two body approaches _imperative_ and
>> _functional_.
>>
>> The imperative approach treats bindings as initially-DU variables that
>> must be
>> DA on successful completion, getting their value through ordinary
>> assignment;
>> the functional approach sets all the bindings at once, positionally.
>> Either
>> way, member patterns (except maybe deconstructors) also need a way to
>> differentiate a successful match from a failed match.
>>
>> Here is the `Point` deconstructor with both imperative and functional
>> style. The
>> functional style uses a placeholder `match` statement to indicate a
>> successful
>> match and provision of bindings:
>>
>> ```
>> class Point {
>>     int x, y;
>>
>>     Point(int x, int y) {
>>         this.x = x;
>>         this.y = y;
>>     }
>>
>>     // Imperative style, deconstructor always succeeds
>>     pattern Point(int x, int y) {
>>         x = that.x;
>>         y = that.y;
>>     }
>>
>>     // Functional style
>>     pattern Point(int x, int y) {
>>         match(that.x, that.y);
>>     }
>> }
>> ```
>>
>> There are some obvious differences here.  In the imperative style, the
>> dtor body
>> looks much more like the reverse of the ctor body. The functional style
>> is more
>> concise (and amenable to further concision via the "concise method bodies"
>> mechanism in the future), as well as a number of less obvious
>> differences.  For
>> deconstructors, the imperative approach is likely to feel more natural
>> because
>> of the obvious symmetry with constructors.
>>
>> In reality, it is _premature at this point to have an opinion_, because we
>> haven't yet seen the full scope of the problem; deconstructors are a
>> special
>> case in many ways, which almost surely is distorting our initial
>> opinion.  As we
>> move towards conditional patterns (and pattern lambdas), our opinions may
>> flip.
>>
>> Regardless of which we pick, there are some additional syntactic choices
>> to be
>> made -- what syntax to use to indicate success (we used `match` in the
>> above
>> example) or failure.  (We should be especially careful around trying to
>> reuse
>> words like `return`, `break`, or `yield` because, in the case where there
>> are
>> zero bindings (which is allowable), it becomes unclear whether they mean
>> "fail"
>> or "succeed with zero bindings".)
>>
>> ### Success and failure
>>
>> Except for possibly deconstructors, which we may require to be total, a
>> pattern
>> declaration needs a way to indicate success and failure.  In the examples
>> above,
>> we posited a `match` statement to indicate success in the functional
>> approach,
>> and in both examples leaned on the "implicit success" of deconstructors
>> (under
>> the assumption they always succeed).  Now let's look at the more general
>> case to
>> figure out what else is needed.
>>
>> For a static pattern like `Optional::of`, success is conditional.  Using
>> `match-fail` as a placeholder for "the match failed", this might look like
>> (functional version):
>>
>> ```
>> public static<T> pattern(Optional<T> that) of(T t) {
>>     if (that.isPresent())
>>         match (that.get());
>>     else
>>         match-fail;
>> }
>> ```
>>
>> The imperative version is less pretty, though.  Using `match-success` as a
>> placeholder:
>>
>> ```
>> public static<T> pattern(Optional<T> that) of(T t) {
>>     if (that.isPresent()) {
>>         t = that.get();
>>         match-success;
>>     }
>>     else
>>         match-fail;
>> }
>> ```
>>
>> Both arms of the `if` feel excessively ceremonial here.  And if we chose
>> to not
>> make all deconstruction patterns unconditional, deconstructors would
>> likely need
>> some explicit success as well:
>>
>> ```
>> pattern Point(int x, int y) {
>>     x = that.x;
>>     y = that.y;
>>     match-success;
>> }
>> ```
>>
>> It might be tempting to try and eliminate the need for explicit success by
>> inferring it from whether or not the bindings are DA or not, but this is
>> error-prone, is less type-checkable, and falls apart completely for
>> patterns
>> with no bindings.
>>
>> ### Implicit failure in the functional approach
>>
>> One of the ceremonial-seeming aspects of `Optional::of` above is having
>> to say
>> `else match-fail`, which doesn't feel like it adds a lot of value.
>> Perhaps we
>> can be more concise without losing clarity.
>>
>> Most conditional patterns will have a predicate to determine matching,
>> and then
>> some conditional code to compute the bindings and claim success.  Having
>> to say
>> "and if the predicate didn't hold, then I fail" seems like ceremony for
>> the
>> author and noise for the reader.  Instead, if a conditional pattern falls
>> off
>> the end without matching, we could treat that as simply not matching:
>>
>> ```
>> public static<T> pattern(Optional<T> that) of(T t) {
>>     if (that.isPresent())
>>         match (that.get());
>> }
>> ```
>>
>> This says what we mean: if the optional is present, then this pattern
>> succeeds
>> and bind the contents of the `Optional`.  As long as our "succeed"
>> construct
>> strongly enough connotes that we are terminating abruptly and
>> successfully, this
>> code is perfectly clear.  And most conditional patterns will look a lot
>> like
>> `Optional::of`; do some sort of test and if it succeeds, extract the
>> state and
>> bind it.
>>
>> At first glance, this "implicit fail" idiom may seem error-prone or
>> sloppy.  But
>> after writing a few dozen patterns, one quickly tires of saying "else
>> match-fail" -- and the reader doesn't necessarily appreciate reading it
>> either.
>>
>> Implicit failure also simplifies the selection of how we explicitly
>> indicate
>> failure; using `return` in a pattern for "no match" becomes pretty much a
>> forced
>> move.  We observe that (in a void method), "return" and "falling off the
>> end"
>> are equivalent; if "falling off the end" means "no match", then so should
>> an
>> explicit `return`.  So in those few cases where we need to explicitly
>> signal "no
>> match", we can just use `return`.  It won't come up that often, but
>> here's an
>> example where it does:
>>
>> ```
>> static pattern(int that) powerOfTwo(int exp) {
>>     int exp = 0;
>>
>>     if (that < 1)
>>         return; // explicit fail
>>
>>     while (that > 1) {
>>         if (that % 2 == 0) {
>>             that /= 2;
>>             ++exp;
>>         }
>>         else
>>             return; // explicit fail
>>     }
>>     match (exp);
>> }
>> ```
>>
>> As a bonus, if `return` as match failure is a forced move, we need only
>> select a
>> term for "successful match" (which obviously can't be `return`).  We
>> could use
>> `match` as we have in the examples, or a variant like `matched` or
>> `matches`.
>> But rather than just creating a new control operator, we have an
>> opportunity to
>> lean into the duality a little harder, by including the pattern syntax in
>> the
>> match:
>>
>> ```
>> matches of(that.get());
>> ```
>>
>> or the (optionally?) qualified (inferring type arguments, as we do at the
>> use
>> site):
>>
>> ```
>> matches Optional.of(that.get());
>> ```
>>
>> These "use the name" approaches trades a small amount of verbosity to
>> gain a
>> higher degree of fidelity to the pattern use site (and to evoke the
>> comethod
>> completion.)
>>
>> If we don't choose "implicit fail", we would have to invent _two_ new
>> control
>> flow statements to indicate "success" and "failure".
>>
>> My personal position: for the functional approach, implicit failure both
>> makes
>> the code simpler and clearer, and after you get used to it, you don't
>> want to go
>> back.  Whether we say `match` or `matches` or `matches <pattern-name>`
>> are all
>> workable, though I like some variant that names the pattern.
>>
>> ### Implicit success in the imperative approach
>>
>> In the imperative approach, we can be implicit as well, but it feels more
>> natural (at least, initially) to choose implicit success rather than
>> failure.
>> This works great for unconditional patterns:
>>
>> ```
>> pattern Point(int x, int y) {
>>     x = that.x;
>>     y = that.y;
>>     // implicit success
>> }
>> ```
>>
>> but not quite as well for conditional patterns:
>>
>> ```
>> static<T> pattern(Optional<T> that) of(T t) {
>>     if (that.isPresent()) {
>>         t = that.get();
>>     }
>>     else
>>         match-fail;
>>     // implicit success
>> }
>> ```
>>
>> We can eliminate one of the arms of the if, with the more concise (but
>> convoluted) inversion:
>>
>> ```
>> static<T> pattern(Optional<T> that) of(T t) {
>>     if (!that.isPresent())
>>         match-fail;
>>     t = that.get();
>>     // implicit success
>> }
>> ```
>>
>> Just as with the functional approach, if we choose imperative and
>> "implicit
>> success", using `return` to indicate success is pretty much a forced
>> move.
>>
>> ### Imperative is a trap
>>
>> If we assume that functional implies implicit failure, and imperative
>> implies
>> implicit success, then our choices become:
>>
>> ```
>> class Optional<T> {
>>     public static<T> Optional<T> of(T t) { ... }
>>
>>     // imperative, implicit success
>>     public static<T> pattern(Optional<T> that) of(T t) {
>>         if (that.isPresent()) {
>>             t = that.get();
>>         }
>>         else
>>             match-fail;
>>     }
>>
>>     // functional, implicit failure
>>     public static<T> pattern(Optional<T> that) of(T t) {
>>         if (that.isPresent())
>>             matches of(that.get());
>>     }
>> }
>> ```
>>
>> Once we get past deconstructors, the imperative approach looks worse by
>> comparison because we need to assign all the bindings (which is _O(n)_
>> assignments) _and also_ indicate success or failure somehow, whereas in
>> the
>> functional style all can be done together with a single `matches`
>> statement.
>>
>> Looking at the alternatives, except maybe for unconditional patterns, the
>> functional example above seems a lot more natural.  The imperative
>> approach
>> works with deconstructors (assuming they are not conditional), but does
>> not
>> scale so well to conditionality -- which is the essence of patterns.
>>
>> From a theoretical perspective, the method-comethod duality also gives us
>> a
>> forceful nudge towards the functional approach.  In a method, the method
>> arguments are specified as a positional list of expressions at the use
>> site:
>>
>>     m(a, b, c)
>>
>> and these values are invisibly copied into the parameter slots of the
>> method
>> prior to frame activation.  The dual to that for a comethod to similarly
>> convey
>> the bindings in a positional list of expressions (as they must either all
>> be
>> produced or none), where they are copied into the slots provided at the
>> use
>> site, as is indicated by `matches` in the above examples.
>>
>> My personal position: the imperative style feels like a trap.  It seems
>> "obvious" at first if we start with deconstructors, but becomes
>> increasingly
>> difficult when we get past this case, and gets in the way of other
>> opportunities.  The last gasp before acceptance is the discomfort that
>> dtor and
>> ctor bodies are written in different styles, but in the rear-view mirror,
>> this
>> feels like a non-issue.
>>
>> ### Derive imperative from functional?
>>
>> If we start with "functional with implicit failure", we can possibly
>> rescue
>> imperative by deriving a version of imperative from functional, by
>> "overloading"
>> the match-success operator.
>>
>> If we have a pattern whose binding names are `b1..bn` of types `B1..Bn`,
>> then
>> the `matches` operator must take a list of expressions `e1..en` whose
>> arity and
>> types are compatible with `B1..Bn`.  But we could allow `matches` to also
>> have a
>> nilary form, which would have the effect of being shorthand for
>>
>>     matches <pattern-name>(b1, b2, ..., bn)
>>
>> where each of `b1..bn` must be DA at the point of matching.  This means
>> that we
>> could express patterns in either form:
>>
>> ```
>> class Optional<T> {
>>     public static<T> Optional<T> of(T t) { ... }
>>
>>     // imperative, derived from functional with implicit failure
>>     public static<T> pattern(Optional<T> that) of(T t) {
>>         if (that.isPresent()) {
>>             t = that.get();
>>             matches of;
>>         }
>>     }
>>
>>     public static<T> pattern(Optional<T> that) of(T t) {
>>         if (that.isPresent())
>>             matches of(that.get());
>>     }
>> }
>> ```
>>
>> This flexibility allows users to select a more verbose expression in
>> exchange
>> for a clearer association of expressions and bindings, though as we'll
>> see, it
>> does come with some additional constraints.
>>
>> ### Wrapping an existing API
>>
>> Nearly every library has methods (sometimes sets of methods) that are
>> patterns
>> in disguise, such as the pair of methods `isArray` and `getComponentType`
>> in
>> `Class`, or the `Matcher` helper type in `java.util.regex`.  Library
>> maintainers
>> will likely want to wrap (or replace) these with real patterns, so these
>> can
>> participate more effectively in conditional contexts, and in some cases,
>> highlight their duality with factory methods.
>>
>> Matching a string against a `j.u.r.Pattern` regular expression has all
>> the same
>> elements as a pattern, just with an ad-hoc API (and one that I have to
>> look up
>> every time).  But we can fairly easily wrap a true pattern around the
>> existing
>> API.  To match against a `Pattern` today, we pass the match candidate to
>> `Pattern::matcher`, which returns a `Matcher` with accessors
>> `Matcher::matches`
>> (did it match) and `Matcher::group` (conditionally extract a particular
>> capture
>> group.)  If we want to wrap this with a pattern called `regexMatch`:
>>
>> ```
>> pattern(String that) regexMatch(String... groups) {
>>     Matcher m = this.matcher(that);
>>     if (m.matches())
>>         matches Pattern.regexMatch(IntStream.range(1, m.groupCount())
>>                                             .map(Matcher::group)
>>                                             .toArray(String[]::new));
>>     // whole lotta matchin' goin' on
>> }
>> ```
>>
>> This says that a `j.u.r.Pattern` has an instance pattern called `regex`,
>> whose
>> match candidate is `String`, and which binds a varargs of `String`
>> corresponding
>> to the capture groups.  The implementation simply delegates to the
>> existing
>> `j.u.r.Matcher` API.  This means that `j.u.r.Pattern` becomes a sort of
>> "pattern
>> object", and we can use it as a receiver at the use site:
>>
>> ```
>> static Pattern As = Pattern.compile("(a*)");
>> static Pattern Bs = Pattern.compile("(b*)");
>> ...
>> switch (string) {
>>     case As.regexMatch(var as): ...
>>     case Bs.regexMatch(var bs): ...
>>     ...
>> }
>> ```
>>
>> ### Odds and ends
>>
>> There are a number of loose ends here.  We could choose other names for
>> the
>> match-success and match-fail operations, including trying to reuse
>> `break` or
>> `yield`.  But, this reuse is tricky; it must be very clear whether a
>> given form
>> of abrupt completion means "success" or "failure", because in the case of
>> patterns with no bindings, we will have no other syntactic cues to help
>> disambiguate.  (I think having a single `matches`, with implicit failure
>> and
>> `return` meaning failure, is the sweet spot here.)
>>
>> Another question is whether the binding list introduces corresponding
>> variables
>> into the scope of the body.  For imperative, the answer is "surely yes";
>> for
>> functional, the answer is "maybe" (unless we want to do the trick where we
>> derive imperative from functional, in which case the answer is "yes"
>> again.)
>>
>> If the binding list does not correspond to variables in the body, this
>> may be
>> initially discomforting; because they do not declare program elements,
>> they may
>> feel that they are left "dangling".  But even if they are not declaring
>> _program_ elements, they are still declaring _API_ elements (similar to
>> the
>> return type of a method.)  We will want to provide Javadoc on the
>> bindings, just
>> like with parameters; we will want to match up binding names in
>> deconstructors
>> with parameter names in constructors; we may even someday want to support
>> by-name binding at the use site (e.g., `case Foo(a: var a)`).  The names
>> are
>> needed for all of these, just not for the body. Names still matter.  My
>> take
>> here is that this is a transient "different is scary" reaction, one that
>> we
>> would get over quickly.
>>
>> A final question is whether we should consider unqualified names as
>> implicitly
>> qualified by `that` (and also `this`, for instance patterns, with some
>> conflict
>> resolution).  Users will probably grow tired of typing `that.` all the
>> time, and most of the time, the unqualified use is perfectly readable.
>>
>> ## Exhaustiveness
>>
>> There is one last syntax question in front of us: how to indicate that a
>> set of
>> patterns are (claimed to be) exhaustive on a given match candidate type.
>> We see
>> this with `Optional::of` and `Optional::empty`; it would be sad if the
>> compiler
>> did not realize that these two patterns together were exhaustive on
>> `Optional`.
>> This is not a feature that will be used often, but not having it at all
>> will be
>> a repeated irritant.
>>
>> The best I've come up with is to call these `case` patterns, where a set
>> of
>> `case` patterns for a given match candidate type in a given class are
>> asserted
>> to be an exhaustive set:
>>
>> ```
>> class Optional<T> {
>>     static<T> Optional<T> of(T t) { ... }
>>     static<T> Optional<T> empty() { ... }
>>
>>     static<T> case pattern of(T t) for Optional<T> { ... }
>>     static<T> case pattern empty() for Optional<T> { ... }
>> }
>> ```
>>
>> Because they may not be truly exhaustive, `switch` constructs will have
>> to back
>> up the static assumption of exhaustiveness with a dynamic check, as we do
>> for
>> other sets of exhaustive patterns that may have remainder.
>>
>> I've experimented with variants of `sealed` but it felt more forced, so
>> this is
>> the best I've come up with.
>>
>> ## Example: patterns delegating to other patterns
>>
>> Pattern implementations must compose.  Just as a subclass constructor
>> delegates
>> to a superclass constructor, the same should be true for deconstructors.
>> Here's a typical superclass-subclass pair:
>>
>> ```
>> class A {
>>     private final int a;
>>
>>     public A(int a) { this.a = a; }
>>     public pattern A(int a) { matches A(that.a); }
>> }
>>
>> class B extends A {
>>     private final int b;
>>
>>     public B(int a, int b) {
>>         super(a);
>>         this.b = b;
>>     }
>>
>>     // Imperative style
>>     public pattern B(int a, int b) {
>>         if (that instanceof super(var aa)) {
>>             a = aa;
>>             b = that.b;
>>             matches B;
>>         }
>>     }
>>
>>     // Functional style
>>     public pattern B(int a, int b) {
>>         if (that instanceof super(var a))
>>             matches B(a, b);
>>     }
>> }
>> ```
>>
>> (Ignore the flow analysis and totality for the time being; we'll come
>> back to
>> this in a separate document.)
>>
>> The first thing that jumps out at us is that, in the imperative version,
>> we had
>> to create a "garbage" variable `aa` to receive the binding, because `a`
>> was
>> already in scope, and then we have to copy the garbage variable into the
>> real
>> binding variable. Users will surely balk at this, and rightly so.  In the
>> functional version (depending on the choices from "Odds and Ends") we are
>> free
>> to use the more natural name and avoid the roundabout locution.
>>
>> We might be tempted to fix the "garbage variable" problem by inventing
>> another
>> sub-feature: the ability to use an existing variable as the target of a
>> binding,
>> such as:
>>
>> ```
>> pattern Point(int a, int b) {
>>     if (this instanceof A(__bind a))
>>         b = this.b;
>> }
>> ```
>>
>> But, I think the language is stronger without this feature, for two
>> reasons.
>> First, having to reason about whether a pattern match introduces a new
>> binding
>> or assigns to an existing variables is additional cognitive load for
>> users to
>> reason about, and second, having assignment to locals happening through
>> something other than assignment introduces additional complexity in
>> finding
>> where a variable is modified.  While we can argue about the general
>> utility of
>> this feature, bringing it in just to solve the garbage-variable problem is
>> particularly unattractive.
>>
>> ## Pattern lambdas
>>
>> One final consideration is is that patterns may also have a lambda form.
>> Given
>> a single-abstract-pattern (SAP) interface:
>>
>> ```
>> interface Converter<T,U> {
>>     pattern(T t) convert(U u);
>> }
>> ```
>>
>> one can implement such a pattern with a lambda. Such a lambda has one
>> parameter
>> (the match candidate), and its body looks like the body of a declared
>> pattern:
>>
>> ```
>> Converter<Integer, Short> c =
>>     i -> {
>>         if (i >= Short.MIN_VALUE && i <= Short.MAX_VALUE)
>>             matches Converter.convert((short) i);
>>     };
>> ```
>>
>> Because the bindings of the pattern lambda are defined in the interface,
>> not in
>> the lambda, this is one more reason not to like the imperative version:
>> it is
>> brittle, and alpha-renaming bindings in the interface would be a
>> source-incompatible change.
>>
>> ## Example gallery
>>
>> Here's all the pattern examples so far, and a few more, using the
>> suggested
>> style (functional, implicit fail, implicit `that`-qualification):
>>
>> ```
>> // Point dtor
>> pattern Point(int x, int y) {
>>     matches Point(x, y);
>> }
>>
>> // Optional -- static patterns for Optional::of, Optional::empty
>> static<T> case pattern(Optional<T> that) of(T t) {
>>     if (isPresent())
>>         matches of(t);
>> }
>>
>> static<T> case pattern(Optional<T> that) empty() {
>>     if (!isPresent())
>>         matches empty();
>> }
>>
>> // Class -- instance pattern for arrayClass (match candidate type
>> inferred)
>> pattern arrayClass(Class<?> componentType) {
>>     if (that.isArray())
>>         matches arrayClass(that.getComponentType());
>> }
>>
>> // regular expression -- instance pattern in j.u.r.Pattern
>> pattern(String that) regexMatch(String... groups) {
>>     Matcher m = matcher(that);
>>     if (m.matches())
>>         matches Pattern.regexMatch(IntStream.range(1, m.groupCount())
>>                                             .map(Matcher::group)
>>                                             .toArray(String[]::new));
>> }
>>
>> // power of two (somewhere)
>> static pattern(int that) powerOfTwo(int exp) {
>>     int exp = 0;
>>
>>     if (that < 1)
>>         return;
>>
>>     while (that > 1) {
>>         if (that % 2 == 0) {
>>             that /= 2;
>>             exp++;
>>         }
>>         else
>>             return;
>>     }
>>     matches powerOfTwo(exp);
>> }
>> ```
>>
>> ## Closing thoughts
>>
>> I came out of this exploration with very different conclusions than I
>> expected
>> when going in.  At first, the "inverse" syntax seemed stilted, but over
>> time it
>> started to seem more obvious.  Similarly, I went in expecting to prefer
>> the
>> imperative approach for the body, but over time, started to warm to the
>> functional approach, and eventually concluded it was basically a forced
>> move if
>> we want to support more than just deconstructors.  And I started out
>> skeptical
>> of "implicit fail", but after writing a few dozen patterns with it, going
>> back
>> to fully explicit felt painful.  All of this is to say, you should hold
>> your
>> initial opinions at arm's length, and give the alternatives a chance to
>> sink in.
>>
>> For most _conditional_ patterns (and conditionality is at the heart of
>> pattern
>> matching), the functional approach cleanly highlights both the match
>> predicate
>> and the flow of values, and is considerably less fussy than the imperative
>> approach in the same situation; `Optional::of`, `Class::arrayClass`, and
>> `regex`
>> look great here, much better than the would with imperative.  None of
>> these
>> illustrate delegation, but in the presence of delegation, the gap gets
>> even
>> wider.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20240402/4f20be9e/attachment-0001.htm>


More information about the amber-spec-observers mailing list