Member Patterns -- the bikeshed
Victor Nazarov
asviraspossible at gmail.com
Sat Mar 30 19:23:11 UTC 2024
I have two points that I think may be good to consider in the list of
options.
1. I'm not sure if this was considered, but I find explicit lists of
covering patterns
rather natural and more flexible than using case as a pattern-modifier.
Explicit lists may look like:
````
// Matches declaration "matches (of|empty)" states that
// "of" and "empty" covers full set of Optional<T> values
class Optional<T> matches of|empty {
}
````
The important feature of explicit lists is that there may be more than one
covering set of patterns.
````
// There can be multiple sets of patterns, were each set covers all
possibilities
class List matches headAndTail|empty, initAndLast|empty {
// ...
}
class Glass matches empty|nonEmpty, full|nonFull {
// ...
}
````
2. I think that there is a middle ground between functional and imperative
pattern body definition style that may look cumbersome at first, but
nevertheless gives you best of both worlds:
* deconstructor patterns look dual to constructors
* names from the list of pattern variables are actually used and
checked by the compiler
* control flow is still functional, which is more natural
The downside that is retained from the imperative style is the need for
alpha-renaming,
but I think we still have to deal with shadowing and renaming
local-variable seems natural and easy.
Middle ground may be used like a special form that can be used in the
pattern body.
This form works mostly the same way as `with`-clause as defined in the
"Derived Record Instances" JEP.
Here is the long list of examples to fully illustrate different
interactions:
````
class Optional<T> matches (of|empty) {
public static <T> pattern<Optional<T>> of(T value) {
if (that.isPresent()) {
match {
value = that.get();
}
}
}
public static <T> pattern<Optional<T>> empty() {
if (that.isEmpty())
match {}
}
}
class Pattern {
public pattern<String> regexMatch(String... groups) {
Matcher m = this.matcher(that);
if (m.matches()) {
match {
groups =
IntStream.range(1, m.groupCount())
.map(Matcher::group)
.toArray(String[]::new);
}
}
}
}
class A {
private final int a;
public A(int a) {
this.a = a;
}
public pattern A(int a) {
match {
a = that.a;
}
}
}
class B extends A {
private final int b;
public B(int a, int b) {
super(a);
this.b = b;
}
public pattern B(int a, int b) {
if (that instanceof super(var aa)) {
match {
a = aa;
b = that.b;
}
}
}
}
interface Converter<T,U> {
pattern<T> convert(U u);
}
Converter<Integer, Short> c =
pattern (s) -> {
if (that >= Short.MIN_VALUE && that <= Short.MAX_VALUE)
match {
s = (short) that;
}
};
````
--
Victor Nazarov
On Fri, Mar 29, 2024 at 10:59 PM Brian Goetz <brian.goetz at oracle.com> wrote:
> We now come to the long-awaited bikeshed discussion on what member
> patterns should look like.
>
> Bikeshed disclaimer for EG:
> - This is likely to evoke strong opinions, so please take pains to be
> especially constructive
> - Long reply-to-reply threads should be avoided even more than usual
> - Holistic, considered replies preferred
> - Please change subject line if commenting on a sub-topic or tangential
> concern
>
> Special reminders for Remi:
> - Use of words like "should", "must", "shouldn't", "mistake", "wrong",
> "broken"
> are strictly forbidden.
> - If in doubt, ask questions first.
>
> Notes for external observers:
> - This is a working document for the EG; the discussion may continue for a
> while before there is an official proposal. Please be patient.
>
>
> # Pattern declaration: the bikeshed
>
> We've largely identified the model for what kinds of patterns we need to
> express, but there are still several degrees of freedom in the syntax.
>
> As the model has simplified during the design process, the space of syntax
> choices has been pruned back, which is a good thing. However, there are
> still
> quite a few smaller decisions to be made. Not all of the considerations
> are
> orthogonal, so while they are presented individually, this is not a "pick
> one
> from each column" menu.
>
> Some of these simplifications include:
>
> - Patterns with "input arguments" have been removed; another way to get
> to what
> this gave us may come back in another form.
> - I have grown increasingly skeptical of the value of the imperative
> `match`
> statement. With better totality analysis, I think it can be eliminated.
>
> We can discuss these separately but I would like to sync first on the broad
> strokes for how patterns are expressed.
>
> ## Object model requirements
>
> As outlined in "Towards Member Patterns", the basic model is that patterns
> are
> the dual of other executable members (constructors, static methods,
> instance
> methods.) While they are like methods in that they have inputs, outputs,
> names,
> and an imperative body, they have additional degrees of freedom that
> constructors and methods lack:
>
> - Patterns are, in general, _conditional_ (they can succeed or fail), and
> only
> produce bindings (outputs) when they succeed. This conditionality is
> understood by the language's flow analysis, and is used for computing
> scoping
> and definite assignment.
> - Methods can return at most one value; when a pattern completes
> successfully,
> it may bind multiple values.
> - All patterns have a _match candidate_, which is a distinguished,
> possibly-implicit parameter. Some patterns also have a receiver, which
> is
> also a distinguished, possibly-implicit parameter. In some such cases
> the
> receiver and match candidate are aliased, but in others these may refer
> to
> different objects.
>
> So a pattern is a named executable member that takes a _match candidate_
> as a
> possibly-implicit parameter, maybe takes a receiver as an implicit
> parameter,
> and has zero or more conditional _bindings_. Its body can perform
> imperative
> computation, and can terminate either with match failure or success. In
> the
> success case, it must provide a value for each binding.
>
> Deconstruction patterns are special in many of the same ways constructors
> are:
> they are constrained in their name, inheritance, and probably their
> conditionality (they should probably always succeed). Just as the syntax
> for
> constructors differs slightly from that of instance methods, the syntax for
> deconstructors may differ slightly from that of instance patterns. Static
> patterns, like static methods, have no receiver and do not have access to
> the
> type parameters of the enclosing class.
>
> Like constructors and methods, patterns can be overloaded, but in
> accordance
> with their duality to constructors and methods, the overloading happens on
> the
> _bindings_, not the inputs.
>
> ## Use-site syntax
>
> There are several kinds of type-driven patterns built into the language:
> type
> patterns and record patterns. A type pattern in a `switch` looks like:
>
> case String s: ...
>
> And a record pattern looks like:
>
> case MyRecord(P1, P2, ...): ...
>
> where `P1..Pn` are nested patterns that are recursively matched to the
> components of the record. This use-site syntax for record patterns was
> chosen
> for its similarity to the construction syntax, to highlight that a record
> pattern is the dual of record construction.
>
> **Deconstruction patterns.** The simplest kind of member pattern, a
> deconstruction pattern, will have the same use-site syntax as a record
> pattern;
> record patterns can be thought of as a deconstruction pattern "acquired for
> free" by records, just as records do with constructors, accessors, object
> methods, etc. So the use of a deconstruction pattern for `Point` looks
> like:
>
> case Point(var x, var y): ...
>
> whether `Point` is a record or an ordinary class equipped with a suitable
> deconstruction pattern.
>
> **Static patterns.** Continuing with the idea that the destructuring
> syntax
> should evoke the aggregation syntax, there is an obvious candidate for the
> use-site syntax for static patterns:
>
> case Optional.of(var e): ...
> case Optional.empty(): ...
>
> **Instance patterns.** Uses of instance patterns will likely come in two
> forms,
> analogous to bound and unbound instance method references, depending on
> whether
> the receiver and the match candidate are the same object. In the unbound
> form,
> used when the receiver is the same object as the match candidate, the
> pattern
> name is qualified by a _type_:
>
> ```
> Class<?> k = ...
> switch (k) {
> // Qualified by type
> case Class.arrayClass(var componentType): ...
> }
> ```
>
> This means that we _resolve_ the pattern `arrayClass` starting at `Class`
> and
> _select_ the pattern using the receiver, `k`. We may also be able to omit
> the
> class qualifier if the static type of the match candidate is sufficient to
> resolve the desired pattern.
>
> In the bound form, used when the receiver is distinct from the match
> candidate,
> the pattern name is qualified with an explicit _receiver expression_. As
> an
> example, consider an interface that captures primitive widening and
> narrowing
> conversions, such as those between `int` and `long`. In the widening
> direction,
> conversion is unconditional, so this can be modeled as a method from `int`
> to
> `long`. In the other direction, conversion is conditional, so this is
> better
> modeled as a _pattern_ whose match candidate is `long` and which binds an
> `int`
> on success. Since these are instance methods of some class (say,
> `NumericConversion<T,U>`), we need to provide the receiver instance in
> order to
> resolve the pattern:
>
> ```
> NumericConversion<int, long> nc = ...
>
> switch (aLong) {
> case nc.narrowed(int i):
> ...
> }
> ```
>
> The explicit receiver syntax would also be used if we exposed regular
> expression
> matching as a pattern on the `j.u.r.Pattern` object (the name collision on
> `Pattern` is unfortunate). Imagine we added a `matching` instance pattern
> to
> `j.u.r.Pattern`; then we could use it in `instanceof` as follows:
>
> ```
> static final java.util.regex.Pattern P = Pattern.compile("(a*)(b*)");
> ...
> if (aString instanceof P.matching(String as, String bs)) { ... }
> ```
>
> Each of these use-site syntaxes is modeled after the use-site syntax for a
> method invocation or method reference.
>
> ## Declaration-site syntax
>
> To avoid being biased by the simpler cases, we're going to work all the
> cases
> concurrently rather than starting with the simpler cases and working up.
> (It
> might seem sensible to start with deconstructors, since they are the "easy"
> case, but if we did that, we would likely be biased by their simplicity
> and then
> find ourselves painted into a corner.) As our example gallery, we will
> consider:
>
> - Deconstruction pattern for `Point`;
> - Static patterns for `Optional::of` and `Optional::empty`;
> - Static pattern for "power of two" (illustrating a computations where
> success
> or failure, and computation of bindings, cannot easily be separated);
> - Instance pattern for `Class::arrayClass` (used unbound);
> - Instance pattern for `Pattern::matching` on regular expressions (used
> bound).
>
> Member patterns, like methods, have _names_. (We can think of
> constructors as
> being named for their enclosing classes, and the same for
> deconstructors.) All
> member patterns have a (possibly empty) ordered list of _bindings_, which
> are
> the dual of constructor or method parameters. Bindings, in turn, have
> names and
> types. And like constructors and methods, member patterns have a _body_
> which
> is a block statement. Member patterns also have a _match candidate_,
> which is a
> likely-implicit method parameter.
>
> ### Member patterns as inverse methods and constructors
>
> Regardless of syntax, let us remind ourselves that that deconstructors are
> the
> categorical dual to constructors (coconstructors), and pattern methods are
> the
> categorical dual to methods (comethods). They are dual in their
> structure: a
> constructor or method takes N arguments and produces a result, the
> corresponding
> member pattern consumes a match candidate and (conditionally) produces N
> bindings.
>
> Moreover, they are semantically dual: the return value produced by
> construction
> or factory invocation is the match candidate for the corresponding member
> pattern, and the bindings produced by a member pattern are the answers to
> the
> _Pattern Question_ -- "could this object have come from an invocation of my
> dual, and if so, with what arguments."
>
> ### What do we call them?
>
> Given the significant overlap between methods and patterns, the first
> question
> about the declaration we need to settle is how to identify a member pattern
> declaration as distinct from a method or constructor declaration. _Towards
> Member Patterns_ tried out a syntax that recognized these as _inverse_
> methods
> and constructors:
>
> public Point(int x, int y) { ... }
> public inverse Point(int x, int y) { ... }
>
> While this is a principled choice which clearly highlights the duality,
> and one
> that might be good for specification and verbal description, it is
> questionable
> whether this would be a great syntax for reading and writing programs.
>
> A more traditional option is to choose a "noun" (conditional) keyword,
> such as
> `pattern`, `matcher`, `extractor`, `view`, etc:
>
> public pattern Point(int x, int y) { ... }
>
> If we are using a noun keyword to identify pattern declarations, we could
> use
> the same noun for all of them, or we could choose a different one for
> deconstruction patterns:
>
> public deconstructor Point(int x, int y) { ... }
>
> Alternately, we could reach for a symbol to indicate that we are talking
> about
> an inverted member. C++ fans might suggest
>
> public ~Point(int x, int y) { ... }
>
> but this is too cryptic (it's evocative once you see it, but then it
> becomes
> less evocative as we move away from deconstructors towards instance
> patterns.)
>
> If we wish to offer finer-grained control over conditionality, we might
> additionally need a `total` / `partial` modifier, though I would prefer to
> avoid
> that.
>
> Of the keyword candidates, there is one that stands out (for good and bad)
> because it connects to something that is already in the language:
> `pattern`. On
> the one hand, using the term `pattern` for the declaration is a slight
> abuse; on
> the other, users will immediately connect it with "ah, so that's how I
> make a
> new pattern" or "so that's what happens when I match against this pattern."
> (Lisps would resolve this tension by calling it `defpattern`.)
>
> The others (`matcher`, `view`, `extractor`, etc) are all made-up terms that
> don't connect to anything else in the language, for better or worse. If
> we pick
> one of these, we are asking users to sort out _three_ separate new things
> in
> their heads: (use-site) patterns, (declaration-site) matchers, and the
> rules of
> how patterns and matchers are connected. Calling them both "patterns",
> despite
> the mild abuse of terminology, ties them together in a way that recognizes
> their
> connection.
>
> My personal position: `pattern` is the strongest candidate here, despite
> some
> flaws.
>
> ### Binding lists and match candidates
>
> There are two obvious alternatives for describing the binding list and
> match
> candidate of a pattern declaration, both with their roots in the
> constructor and
> method syntax:
>
> - Pretend that a pattern declaration is like a method with multiple
> return, and
> put the binding list in the "return position", and make the match
> candidate
> an ordinary parameter;
> - Lean into the inverse relationship between constructors and methods (and
> consistency with the use-site syntax), and put the binding list in the
> "parameter list position". For static patterns and some instance
> patterns,
> which need to explicitly identify the match candidate type, there are
> several
> sub-options:
> - Lean further into the duality, putting the match candidate type in the
> "return position";
> - Put the match candidate type somewhere else, where it is less likely
> to be
> confused for a method return.
>
> The "method-like" approach might look like this:
>
> ```
> class Point {
> // Constructor and deconstructor
> public Point(int x, int y) { ... }
> public pattern (int x, int y) Point(Point target) { ... }
> ...
> }
>
> class Optional<T> {
> // Static factory and pattern
> public static<T> Optional<T> of(T t) { ... }
> public static<T> pattern (T t) of(Optional<T> target) { ... }
> ...
> }
> ```
>
> The "inverse" approach might look like:
>
> ```
> class Point {
> // Constructor and deconstructor
> public Point(int x, int y) { ... }
> public pattern Point(int x, int y) { ... }
> ...
> }
>
> class Optional<T> {
> // Static factory and pattern (using the first sub-option)
> public static<T> Optional<T> of(T t) { ... }
> public static<T> pattern Optional<T> of(T t) { ... }
> ...
> }
> ```
>
> With the "method-like" approach, the match candidate gets an explicit name
> selected by the author; with the inverse approach, we can go with a
> predefined
> name such as `that`. (Because deconstructors do not have receivers, we
> could by
> abuse of notation arrange for the keyword `this` to refer instead to the
> match
> candidate within the body of a deconstructor. While this might seem to
> lead to
> a more familiar notation for writing deconstructors, it would create a
> gratuitous asymmetry between the bodies of deconstruction patterns and
> those of
> other patterns.)
>
> Between these choices, nearly all the considerations favor the "inverse"
> approach:
>
> - The "inverse" approach makes the declaration look like the use site.
> This
> highlights that `pattern Point(int x, int y)` is what gets invoked when
> you
> match against the pattern use `Point(int x, int y)`. (This point is so
> strong that we should probably just stop here.)
> - The "inverse" members also look like their duals; the only difference
> is the
> `pattern` keyword (and possibly the placement of the match candidate
> type).
> This makes matched pairs much more obvious, and such matched pairs will
> be
> critical both for future language features and for library idioms.
> - The method-like approach is suggestive of multiple return or tuples,
> which is
> probably helpful for the first few minutes but actually harmful in the
> long
> term. This feature is _not_ (much as some people would like to believe)
> about
> multiple return or tuples, and playing into this misperception will
> only make
> it harder to truly understand. So this suggestion ends up propping up
> the
> wrong mental model.
>
> The main downside of the "inverse" approach is the one-time speed bump of
> the
> unfamiliarity of the inverted syntax. (The "method-like" syntax also has
> its
> own speed bumps, it is just unfamiliar in different ways.) But unlike the
> advantages of the inverse approach, which continue to add value forever,
> this
> speed bump is a one-time hurdle to get over.
>
> To smooth out the speed bumps of the inverse approach, we can consider
> moving
> the position of the match candidate for static and (suitable) instance
> pattern
> declarations, such as:
>
> ```
> class Optional<T> {
> // the usual static factory
> public static<T> Optional<T> of(T t) { ... }
>
> // Various ways of writing the corresponding pattern
> public static<T> pattern of(T t) for Optional<T> { ... }
> // or ...
> public static<T> pattern(Optional<T>) of(T t) { ... }
> // or ...
> public static<T> pattern(Optional<T> that) of(T t) { ... }
> // or ...
> public static<T> pattern<Optional<T>> of(T t) { ... }
> ...
> }
> ```
>
> (The deconstructor example looks the same with either variant.) Of these,
> treating the match candidate like a "parameter" of "pattern" is probably
> the
> most evocative:
>
> ```
> public static<T> pattern(Optional<T> that) of(T t) { ... }
> ```
>
> as it can be read as "pattern taking the parameter `Optional<T> that`
> called
> `of`, binding `T`, and is a short departure from the inverse syntax.
>
> The main value of the various rearrangements is that users don't need to
> think
> about things operating in reverse to parse the syntax. This trades some
> of the
> secondary point (patterns looking almost exactly like their inverses) for a
> certain amount of cognitive load, while maintaining the most important
> consideration: that the declaration site look like the use site.
>
> For instance pattern declarations, if the match candidate type is the same
> as
> the receiver type, the match candidate type can be elided as it is with
> deconstructors.
>
> My personal position: the "multiple return" version is terrible; all the
> sub-variants of the inverse version are probably workable.
>
> ### Naming the match candidate
>
> We've been assuming so far that the match candidate always has a fixed
> name,
> such as `that`; this is an entirely workable approach. Some of the
> variants are
> also amenable to allowing authors to explicitly select a name for the match
> candidate. For example, if we put the match candidate as a "parameter" to
> the `pattern` keyword, there is an obvious place to put the name:
>
> ```
> static<T> pattern(Optional<T> target) of(T t) { ... }
> ```
>
> My personal opinion: I don't think this degree of freedom buys us much,
> and in
> the long run readability probably benefits by picking a fixed name like
> `that`
> and sticking with it. Even with a fixed name, if there is a sensible
> position
> for the name, allowing users to type `that` for explicitness is fine (as
> we do
> with instance methods, though many people don't know this.) We may even
> want to
> require it.
>
> ## Body types
>
> Just as there are two obvious approaches for the declaration, there are two
> obvious approaches we could take for the body (though there is some
> coupling
> between them.) We'll call the two body approaches _imperative_ and
> _functional_.
>
> The imperative approach treats bindings as initially-DU variables that
> must be
> DA on successful completion, getting their value through ordinary
> assignment;
> the functional approach sets all the bindings at once, positionally.
> Either
> way, member patterns (except maybe deconstructors) also need a way to
> differentiate a successful match from a failed match.
>
> Here is the `Point` deconstructor with both imperative and functional
> style. The
> functional style uses a placeholder `match` statement to indicate a
> successful
> match and provision of bindings:
>
> ```
> class Point {
> int x, y;
>
> Point(int x, int y) {
> this.x = x;
> this.y = y;
> }
>
> // Imperative style, deconstructor always succeeds
> pattern Point(int x, int y) {
> x = that.x;
> y = that.y;
> }
>
> // Functional style
> pattern Point(int x, int y) {
> match(that.x, that.y);
> }
> }
> ```
>
> There are some obvious differences here. In the imperative style, the
> dtor body
> looks much more like the reverse of the ctor body. The functional style is
> more
> concise (and amenable to further concision via the "concise method bodies"
> mechanism in the future), as well as a number of less obvious
> differences. For
> deconstructors, the imperative approach is likely to feel more natural
> because
> of the obvious symmetry with constructors.
>
> In reality, it is _premature at this point to have an opinion_, because we
> haven't yet seen the full scope of the problem; deconstructors are a
> special
> case in many ways, which almost surely is distorting our initial opinion.
> As we
> move towards conditional patterns (and pattern lambdas), our opinions may
> flip.
>
> Regardless of which we pick, there are some additional syntactic choices
> to be
> made -- what syntax to use to indicate success (we used `match` in the
> above
> example) or failure. (We should be especially careful around trying to
> reuse
> words like `return`, `break`, or `yield` because, in the case where there
> are
> zero bindings (which is allowable), it becomes unclear whether they mean
> "fail"
> or "succeed with zero bindings".)
>
> ### Success and failure
>
> Except for possibly deconstructors, which we may require to be total, a
> pattern
> declaration needs a way to indicate success and failure. In the examples
> above,
> we posited a `match` statement to indicate success in the functional
> approach,
> and in both examples leaned on the "implicit success" of deconstructors
> (under
> the assumption they always succeed). Now let's look at the more general
> case to
> figure out what else is needed.
>
> For a static pattern like `Optional::of`, success is conditional. Using
> `match-fail` as a placeholder for "the match failed", this might look like
> (functional version):
>
> ```
> public static<T> pattern(Optional<T> that) of(T t) {
> if (that.isPresent())
> match (that.get());
> else
> match-fail;
> }
> ```
>
> The imperative version is less pretty, though. Using `match-success` as a
> placeholder:
>
> ```
> public static<T> pattern(Optional<T> that) of(T t) {
> if (that.isPresent()) {
> t = that.get();
> match-success;
> }
> else
> match-fail;
> }
> ```
>
> Both arms of the `if` feel excessively ceremonial here. And if we chose
> to not
> make all deconstruction patterns unconditional, deconstructors would
> likely need
> some explicit success as well:
>
> ```
> pattern Point(int x, int y) {
> x = that.x;
> y = that.y;
> match-success;
> }
> ```
>
> It might be tempting to try and eliminate the need for explicit success by
> inferring it from whether or not the bindings are DA or not, but this is
> error-prone, is less type-checkable, and falls apart completely for
> patterns
> with no bindings.
>
> ### Implicit failure in the functional approach
>
> One of the ceremonial-seeming aspects of `Optional::of` above is having to
> say
> `else match-fail`, which doesn't feel like it adds a lot of value.
> Perhaps we
> can be more concise without losing clarity.
>
> Most conditional patterns will have a predicate to determine matching, and
> then
> some conditional code to compute the bindings and claim success. Having
> to say
> "and if the predicate didn't hold, then I fail" seems like ceremony for the
> author and noise for the reader. Instead, if a conditional pattern falls
> off
> the end without matching, we could treat that as simply not matching:
>
> ```
> public static<T> pattern(Optional<T> that) of(T t) {
> if (that.isPresent())
> match (that.get());
> }
> ```
>
> This says what we mean: if the optional is present, then this pattern
> succeeds
> and bind the contents of the `Optional`. As long as our "succeed"
> construct
> strongly enough connotes that we are terminating abruptly and
> successfully, this
> code is perfectly clear. And most conditional patterns will look a lot
> like
> `Optional::of`; do some sort of test and if it succeeds, extract the state
> and
> bind it.
>
> At first glance, this "implicit fail" idiom may seem error-prone or
> sloppy. But
> after writing a few dozen patterns, one quickly tires of saying "else
> match-fail" -- and the reader doesn't necessarily appreciate reading it
> either.
>
> Implicit failure also simplifies the selection of how we explicitly
> indicate
> failure; using `return` in a pattern for "no match" becomes pretty much a
> forced
> move. We observe that (in a void method), "return" and "falling off the
> end"
> are equivalent; if "falling off the end" means "no match", then so should
> an
> explicit `return`. So in those few cases where we need to explicitly
> signal "no
> match", we can just use `return`. It won't come up that often, but here's
> an
> example where it does:
>
> ```
> static pattern(int that) powerOfTwo(int exp) {
> int exp = 0;
>
> if (that < 1)
> return; // explicit fail
>
> while (that > 1) {
> if (that % 2 == 0) {
> that /= 2;
> ++exp;
> }
> else
> return; // explicit fail
> }
> match (exp);
> }
> ```
>
> As a bonus, if `return` as match failure is a forced move, we need only
> select a
> term for "successful match" (which obviously can't be `return`). We could
> use
> `match` as we have in the examples, or a variant like `matched` or
> `matches`.
> But rather than just creating a new control operator, we have an
> opportunity to
> lean into the duality a little harder, by including the pattern syntax in
> the
> match:
>
> ```
> matches of(that.get());
> ```
>
> or the (optionally?) qualified (inferring type arguments, as we do at the
> use
> site):
>
> ```
> matches Optional.of(that.get());
> ```
>
> These "use the name" approaches trades a small amount of verbosity to gain
> a
> higher degree of fidelity to the pattern use site (and to evoke the
> comethod
> completion.)
>
> If we don't choose "implicit fail", we would have to invent _two_ new
> control
> flow statements to indicate "success" and "failure".
>
> My personal position: for the functional approach, implicit failure both
> makes
> the code simpler and clearer, and after you get used to it, you don't want
> to go
> back. Whether we say `match` or `matches` or `matches <pattern-name>` are
> all
> workable, though I like some variant that names the pattern.
>
> ### Implicit success in the imperative approach
>
> In the imperative approach, we can be implicit as well, but it feels more
> natural (at least, initially) to choose implicit success rather than
> failure.
> This works great for unconditional patterns:
>
> ```
> pattern Point(int x, int y) {
> x = that.x;
> y = that.y;
> // implicit success
> }
> ```
>
> but not quite as well for conditional patterns:
>
> ```
> static<T> pattern(Optional<T> that) of(T t) {
> if (that.isPresent()) {
> t = that.get();
> }
> else
> match-fail;
> // implicit success
> }
> ```
>
> We can eliminate one of the arms of the if, with the more concise (but
> convoluted) inversion:
>
> ```
> static<T> pattern(Optional<T> that) of(T t) {
> if (!that.isPresent())
> match-fail;
> t = that.get();
> // implicit success
> }
> ```
>
> Just as with the functional approach, if we choose imperative and "implicit
> success", using `return` to indicate success is pretty much a forced move.
>
>
> ### Imperative is a trap
>
> If we assume that functional implies implicit failure, and imperative
> implies
> implicit success, then our choices become:
>
> ```
> class Optional<T> {
> public static<T> Optional<T> of(T t) { ... }
>
> // imperative, implicit success
> public static<T> pattern(Optional<T> that) of(T t) {
> if (that.isPresent()) {
> t = that.get();
> }
> else
> match-fail;
> }
>
> // functional, implicit failure
> public static<T> pattern(Optional<T> that) of(T t) {
> if (that.isPresent())
> matches of(that.get());
> }
> }
> ```
>
> Once we get past deconstructors, the imperative approach looks worse by
> comparison because we need to assign all the bindings (which is _O(n)_
> assignments) _and also_ indicate success or failure somehow, whereas in the
> functional style all can be done together with a single `matches`
> statement.
>
> Looking at the alternatives, except maybe for unconditional patterns, the
> functional example above seems a lot more natural. The imperative approach
> works with deconstructors (assuming they are not conditional), but does not
> scale so well to conditionality -- which is the essence of patterns.
>
> From a theoretical perspective, the method-comethod duality also gives us a
> forceful nudge towards the functional approach. In a method, the method
> arguments are specified as a positional list of expressions at the use
> site:
>
> m(a, b, c)
>
> and these values are invisibly copied into the parameter slots of the
> method
> prior to frame activation. The dual to that for a comethod to similarly
> convey
> the bindings in a positional list of expressions (as they must either all
> be
> produced or none), where they are copied into the slots provided at the use
> site, as is indicated by `matches` in the above examples.
>
> My personal position: the imperative style feels like a trap. It seems
> "obvious" at first if we start with deconstructors, but becomes
> increasingly
> difficult when we get past this case, and gets in the way of other
> opportunities. The last gasp before acceptance is the discomfort that
> dtor and
> ctor bodies are written in different styles, but in the rear-view mirror,
> this
> feels like a non-issue.
>
> ### Derive imperative from functional?
>
> If we start with "functional with implicit failure", we can possibly rescue
> imperative by deriving a version of imperative from functional, by
> "overloading"
> the match-success operator.
>
> If we have a pattern whose binding names are `b1..bn` of types `B1..Bn`,
> then
> the `matches` operator must take a list of expressions `e1..en` whose
> arity and
> types are compatible with `B1..Bn`. But we could allow `matches` to also
> have a
> nilary form, which would have the effect of being shorthand for
>
> matches <pattern-name>(b1, b2, ..., bn)
>
> where each of `b1..bn` must be DA at the point of matching. This means
> that we
> could express patterns in either form:
>
> ```
> class Optional<T> {
> public static<T> Optional<T> of(T t) { ... }
>
> // imperative, derived from functional with implicit failure
> public static<T> pattern(Optional<T> that) of(T t) {
> if (that.isPresent()) {
> t = that.get();
> matches of;
> }
> }
>
> public static<T> pattern(Optional<T> that) of(T t) {
> if (that.isPresent())
> matches of(that.get());
> }
> }
> ```
>
> This flexibility allows users to select a more verbose expression in
> exchange
> for a clearer association of expressions and bindings, though as we'll
> see, it
> does come with some additional constraints.
>
> ### Wrapping an existing API
>
> Nearly every library has methods (sometimes sets of methods) that are
> patterns
> in disguise, such as the pair of methods `isArray` and `getComponentType`
> in
> `Class`, or the `Matcher` helper type in `java.util.regex`. Library
> maintainers
> will likely want to wrap (or replace) these with real patterns, so these
> can
> participate more effectively in conditional contexts, and in some cases,
> highlight their duality with factory methods.
>
> Matching a string against a `j.u.r.Pattern` regular expression has all the
> same
> elements as a pattern, just with an ad-hoc API (and one that I have to
> look up
> every time). But we can fairly easily wrap a true pattern around the
> existing
> API. To match against a `Pattern` today, we pass the match candidate to
> `Pattern::matcher`, which returns a `Matcher` with accessors
> `Matcher::matches`
> (did it match) and `Matcher::group` (conditionally extract a particular
> capture
> group.) If we want to wrap this with a pattern called `regexMatch`:
>
> ```
> pattern(String that) regexMatch(String... groups) {
> Matcher m = this.matcher(that);
> if (m.matches())
> matches Pattern.regexMatch(IntStream.range(1, m.groupCount())
> .map(Matcher::group)
> .toArray(String[]::new));
> // whole lotta matchin' goin' on
> }
> ```
>
> This says that a `j.u.r.Pattern` has an instance pattern called `regex`,
> whose
> match candidate is `String`, and which binds a varargs of `String`
> corresponding
> to the capture groups. The implementation simply delegates to the existing
> `j.u.r.Matcher` API. This means that `j.u.r.Pattern` becomes a sort of
> "pattern
> object", and we can use it as a receiver at the use site:
>
> ```
> static Pattern As = Pattern.compile("(a*)");
> static Pattern Bs = Pattern.compile("(b*)");
> ...
> switch (string) {
> case As.regexMatch(var as): ...
> case Bs.regexMatch(var bs): ...
> ...
> }
> ```
>
> ### Odds and ends
>
> There are a number of loose ends here. We could choose other names for the
> match-success and match-fail operations, including trying to reuse `break`
> or
> `yield`. But, this reuse is tricky; it must be very clear whether a given
> form
> of abrupt completion means "success" or "failure", because in the case of
> patterns with no bindings, we will have no other syntactic cues to help
> disambiguate. (I think having a single `matches`, with implicit failure
> and
> `return` meaning failure, is the sweet spot here.)
>
> Another question is whether the binding list introduces corresponding
> variables
> into the scope of the body. For imperative, the answer is "surely yes";
> for
> functional, the answer is "maybe" (unless we want to do the trick where we
> derive imperative from functional, in which case the answer is "yes"
> again.)
>
> If the binding list does not correspond to variables in the body, this may
> be
> initially discomforting; because they do not declare program elements,
> they may
> feel that they are left "dangling". But even if they are not declaring
> _program_ elements, they are still declaring _API_ elements (similar to the
> return type of a method.) We will want to provide Javadoc on the
> bindings, just
> like with parameters; we will want to match up binding names in
> deconstructors
> with parameter names in constructors; we may even someday want to support
> by-name binding at the use site (e.g., `case Foo(a: var a)`). The names
> are
> needed for all of these, just not for the body. Names still matter. My
> take
> here is that this is a transient "different is scary" reaction, one that we
> would get over quickly.
>
> A final question is whether we should consider unqualified names as
> implicitly
> qualified by `that` (and also `this`, for instance patterns, with some
> conflict
> resolution). Users will probably grow tired of typing `that.` all the
> time, and most of the time, the unqualified use is perfectly readable.
>
> ## Exhaustiveness
>
> There is one last syntax question in front of us: how to indicate that a
> set of
> patterns are (claimed to be) exhaustive on a given match candidate type.
> We see
> this with `Optional::of` and `Optional::empty`; it would be sad if the
> compiler
> did not realize that these two patterns together were exhaustive on
> `Optional`.
> This is not a feature that will be used often, but not having it at all
> will be
> a repeated irritant.
>
> The best I've come up with is to call these `case` patterns, where a set of
> `case` patterns for a given match candidate type in a given class are
> asserted
> to be an exhaustive set:
>
> ```
> class Optional<T> {
> static<T> Optional<T> of(T t) { ... }
> static<T> Optional<T> empty() { ... }
>
> static<T> case pattern of(T t) for Optional<T> { ... }
> static<T> case pattern empty() for Optional<T> { ... }
> }
> ```
>
> Because they may not be truly exhaustive, `switch` constructs will have to
> back
> up the static assumption of exhaustiveness with a dynamic check, as we do
> for
> other sets of exhaustive patterns that may have remainder.
>
> I've experimented with variants of `sealed` but it felt more forced, so
> this is
> the best I've come up with.
>
> ## Example: patterns delegating to other patterns
>
> Pattern implementations must compose. Just as a subclass constructor
> delegates
> to a superclass constructor, the same should be true for deconstructors.
> Here's a typical superclass-subclass pair:
>
> ```
> class A {
> private final int a;
>
> public A(int a) { this.a = a; }
> public pattern A(int a) { matches A(that.a); }
> }
>
> class B extends A {
> private final int b;
>
> public B(int a, int b) {
> super(a);
> this.b = b;
> }
>
> // Imperative style
> public pattern B(int a, int b) {
> if (that instanceof super(var aa)) {
> a = aa;
> b = that.b;
> matches B;
> }
> }
>
> // Functional style
> public pattern B(int a, int b) {
> if (that instanceof super(var a))
> matches B(a, b);
> }
> }
> ```
>
> (Ignore the flow analysis and totality for the time being; we'll come back
> to
> this in a separate document.)
>
> The first thing that jumps out at us is that, in the imperative version,
> we had
> to create a "garbage" variable `aa` to receive the binding, because `a` was
> already in scope, and then we have to copy the garbage variable into the
> real
> binding variable. Users will surely balk at this, and rightly so. In the
> functional version (depending on the choices from "Odds and Ends") we are
> free
> to use the more natural name and avoid the roundabout locution.
>
> We might be tempted to fix the "garbage variable" problem by inventing
> another
> sub-feature: the ability to use an existing variable as the target of a
> binding,
> such as:
>
> ```
> pattern Point(int a, int b) {
> if (this instanceof A(__bind a))
> b = this.b;
> }
> ```
>
> But, I think the language is stronger without this feature, for two
> reasons.
> First, having to reason about whether a pattern match introduces a new
> binding
> or assigns to an existing variables is additional cognitive load for users
> to
> reason about, and second, having assignment to locals happening through
> something other than assignment introduces additional complexity in finding
> where a variable is modified. While we can argue about the general
> utility of
> this feature, bringing it in just to solve the garbage-variable problem is
> particularly unattractive.
>
> ## Pattern lambdas
>
> One final consideration is is that patterns may also have a lambda form.
> Given
> a single-abstract-pattern (SAP) interface:
>
> ```
> interface Converter<T,U> {
> pattern(T t) convert(U u);
> }
> ```
>
> one can implement such a pattern with a lambda. Such a lambda has one
> parameter
> (the match candidate), and its body looks like the body of a declared
> pattern:
>
> ```
> Converter<Integer, Short> c =
> i -> {
> if (i >= Short.MIN_VALUE && i <= Short.MAX_VALUE)
> matches Converter.convert((short) i);
> };
> ```
>
> Because the bindings of the pattern lambda are defined in the interface,
> not in
> the lambda, this is one more reason not to like the imperative version: it
> is
> brittle, and alpha-renaming bindings in the interface would be a
> source-incompatible change.
>
> ## Example gallery
>
> Here's all the pattern examples so far, and a few more, using the suggested
> style (functional, implicit fail, implicit `that`-qualification):
>
> ```
> // Point dtor
> pattern Point(int x, int y) {
> matches Point(x, y);
> }
>
> // Optional -- static patterns for Optional::of, Optional::empty
> static<T> case pattern(Optional<T> that) of(T t) {
> if (isPresent())
> matches of(t);
> }
>
> static<T> case pattern(Optional<T> that) empty() {
> if (!isPresent())
> matches empty();
> }
>
> // Class -- instance pattern for arrayClass (match candidate type inferred)
> pattern arrayClass(Class<?> componentType) {
> if (that.isArray())
> matches arrayClass(that.getComponentType());
> }
>
> // regular expression -- instance pattern in j.u.r.Pattern
> pattern(String that) regexMatch(String... groups) {
> Matcher m = matcher(that);
> if (m.matches())
> matches Pattern.regexMatch(IntStream.range(1, m.groupCount())
> .map(Matcher::group)
> .toArray(String[]::new));
> }
>
> // power of two (somewhere)
> static pattern(int that) powerOfTwo(int exp) {
> int exp = 0;
>
> if (that < 1)
> return;
>
> while (that > 1) {
> if (that % 2 == 0) {
> that /= 2;
> exp++;
> }
> else
> return;
> }
> matches powerOfTwo(exp);
> }
> ```
>
> ## Closing thoughts
>
> I came out of this exploration with very different conclusions than I
> expected
> when going in. At first, the "inverse" syntax seemed stilted, but over
> time it
> started to seem more obvious. Similarly, I went in expecting to prefer the
> imperative approach for the body, but over time, started to warm to the
> functional approach, and eventually concluded it was basically a forced
> move if
> we want to support more than just deconstructors. And I started out
> skeptical
> of "implicit fail", but after writing a few dozen patterns with it, going
> back
> to fully explicit felt painful. All of this is to say, you should hold
> your
> initial opinions at arm's length, and give the alternatives a chance to
> sink in.
>
> For most _conditional_ patterns (and conditionality is at the heart of
> pattern
> matching), the functional approach cleanly highlights both the match
> predicate
> and the flow of values, and is considerably less fussy than the imperative
> approach in the same situation; `Optional::of`, `Class::arrayClass`, and
> `regex`
> look great here, much better than the would with imperative. None of these
> illustrate delegation, but in the presence of delegation, the gap gets even
> wider.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20240330/f7f7c029/attachment-0001.htm>
More information about the amber-spec-observers
mailing list