Patterns: declaration
Brian Goetz
brian.goetz at oracle.com
Fri Jan 22 16:08:13 UTC 2021
Here's the first half of an *extremely rough* doc-in-progress that
starts to show more of the syntax of the declaration of a pattern -- but
does yet not dive into the _body_ of a pattern (which we should still
leave for a later discussion). This is still more about object model
than directly about syntax; many people have been asking "I'm having a
hard time imagining what this looks like in a class file", so this
should help a bit. It also fleshes out some of the requirements,
specifically patterns delegating to other patterns.
## Beyond built-in patterns
Related documents have explored a number of built-in pattern types,
where the
compiler knows enough about the pattern semantics to implement them
directly,
such as type patterns or record deconstruction patterns. However, we
also wish
to support direct declaration of patterns as class members, such as
deconstruction patterns for non-record classes, static patterns, and
instance
patterns.
Given a pattern match like:
```
if (p instanceof Point(int x, int y)) { ... }
```
assuming the pattern matches, we need to capture somewhere how the
bindings `x`
and `y` are extracted from the `Point`. If `Point` is a record, we can
do this
without help from the user, but otherwise, someone is going to need to write
some code somewhere to capture the applicability test and state extraction
corresponding to this pattern. (One can imagine various "tricks" for
doing so
(i.e., extracting fields, calling getters, etc), but these tricks
quickly run
out of gas for classes whose state is not trivially and transparently
coupled to
its API.) We'll call the various forms of this _member patterns_, which are
expressed as a new kind of class member.
> Member patterns are executable class members, just like constructors and
> methods.
Abstractly, a pattern operates on a single target parameter (which for
instance
and deconstruction patterns is just the receiver), and either succeeds
or fails
based on some condition. If it succeeds, it produces zero or more
output values
(the destructured components, or _bindings_.)
#### Pattern declarations in other languages
We may be able to obtain some inspiration by looking at what other
languages do.
By the late 1990s, pattern matching over structural types (sequences and
tuples)
was a common feature of functional languages; languages that supported
nominal
product types (e.g., Haskell) also supported pattern matching over nominal
product types. In the mid-2000s, languages like [F#][fs-patterns] and
[Scala][scala-patterns] extended the notion to additionally apply to
abstract
data types (those whose concrete representation was not necessary known
to the
client.) However, in both F# and Scala, patterns remain distinctly `static`
entities; they are not a first-class part of the object hierarchy, and
remain at
the periphery of the object model.
Scala takes the approach of modeling patterns as static methods with the
name
`unapply`. An `unapply` method takes a single parameter (type checked
against
the matchee) and has some flexibility in its return type; it can return
`boolean` if there are no bindings, or `Option[T]` (where `T` is often a
tuple
type) to carry the binding parameters. A matcher for `Point(x,y)` would
look
like:
```
object Point {
Option[Tuple[int, int]] unapply() { ... }
}
```
This is a pragmatic approach, with pros and cons. It is nice that it is
just an
ordinary method, which can be called either directly or through a
pattern match.
On the other hand, there is lots of magic -- the language imputes special
semantics based on method name, enclosing class, argument list, and
return type.
Also, because these methods are effectively static, there is no ability to
overload, override, have abstract patterns, etc.
C# takes a more principled, but also more limited, approach. As of C# 9, a
class can write one or more `Deconstruct` methods (magic name), using `out`
parameters to receive the bindings (a feature already present in C#). A
`Deconstruct` method must be total (the deconstruction cannot fail), and
there
is no way to write instance or abstract patterns (though the equivalent
of some
static patterns can be achieved with extension methods.)
Both of these approaches are adding patterns "around the edges"; we are
proposing a pattern matching model that integrates much more deeply into the
object model.
#### Deconstruction patterns in Java
The simplest form of pattern is a _deconstruction pattern_, which is the
dual of
a constructor. And, just as constructors are an odd mix of instance and
static
behavior -- they are instance members (they have a receiver) but are not
inherited and can only be accessed through their class name -- their
duals have
the exact same odd mix. Another oddity of constructors is that they
don't have
names -- the name supplied at the instantiate site is the name of a
class, not
a class member. It makes sense for deconstruction patterns to also
share this
characteristic.
The target of a deconstruction pattern is obvious -- it is the receiver.
Deconstruction patterns are _total_ on their target type -- as long as the
target is valid as a receiver, the pattern always succeeds. We'd like
for the
arity, types, order, and names of the binding variables to be manifest
in the
declaration code, as this is part of the classes API, and for the similarity
with the corresponding constructor, if there is one, to be obvious both
at the
declaration and use site. This might look like:
```{.java}
class Point {
int x, y;
Point(int x, int y) {
// constructor body
}
deconstructor(int x, int y) {
// deconstructor body
}
}
```
There is a strong symmetry between the declarations of the constructor and
deconstructor. The constructor takes as arguments `x` and `y` and
produces a
`Point`; the deconstructor takes a `Point` and produces as bindings `x`
and `y`,
and the declaration of the argument list of the constructor and the
binding list
of the deconstructor are syntactically similar. The locution
`deconstructor(BINDINGS)` is meant to evoke the notion of "returning" the
specified set of bindings. For classes that model entities more or less
transparently, as `Point` does, it is entirely reasonable to match
constructors
with deconstructors whose binding lists correspond to the constructor
parameter
list.
There is also a symmetry between the creation of an object with a
constructor,
and the destructuring of an object with a deconstruction pattern. A client
matches against a deconstruction pattern as follows:
```
case Point(var x, var y):
```
Here, we interpret `Point` as a class name, and look in the `Point`
class for a
deconstruction pattern with compatible binding arity and types --
analogously to
how we interpret the instance creation expression `new Point(x, y)`.
We've left out the declaration of the body for now; we'll return to this
after
our tour of the member pattern model. But it's important to note that,
just as
the constructor is free to perform defensive copying on the inputs, the
deconstructor needs a way to do the same with the bindings.
Deconstruction patterns are unusual in that they are always _total_; if the
target is of the correct type, the deconstruction pattern should always
complete
successfully. However, not all patterns are total; a pattern like
`Optional.of(var x)` will only match a subset of instances of
`Optional`. As a
simplifying approximation, we'll say that all other member pattern are
partial;
the bodies of these patterns will need a way to denote this as well.
There's also no reason that deconstruction patterns cannot appear in
interfaces,
as long the interface supports enough API to implement it. For example, we
might want `Map.Entry` to support a deconstruction pattern that yields
key and
value:
```
interface Entry<K,V> {
K getKey();
V getValue();
deconstructor(K key, V value) {
// invoke getKey() and getValue() and bind them appropriately
}
}
```
This would allow us to iterate through the elements of a `Map` more
directly:
```
for (Map.Entry(var k, var v) : map.entrySet()) {
// use k and v
}
```
#### Instance patterns
The next simplest form of pattern declarations are _instance patterns_,
which
are duals of instance methods. As with deconstruction patterns, the
target of
an instance pattern is the receiver, but unlike deconstruction patterns,
instance patterns can be partial. An instance pattern declaration looks
like:
```
class Class<T> {
public pattern(Class<?> componentType) arrayClass() {
... body ...
}
}
```
A significant difference from deconstruction patterns is that instance
patterns
have names (just as instance methods do); later, we'll see that they can
also
have input arguments.
At the use site, instance patterns are usually unqualified, referring to an
instance pattern on the static type of the match target. So if we have a
`Class` in hand, we can match on those classes that represent arrays as:
```
switch (clazz) {
case arrayClass(var componentType):
}
```
#### Static patterns
Static patterns are slightly more complicated than instance patterns,
because
they lack a receiver -- and so the match target must be identified some
other
way. We can do this by indicating the target type and name as a pattern
parameter. The duality between the "receiver" and the first parameter of a
static method is not only well-known to the JVM), but also has precedent
in the
language: the method reference `Foo::bar` could refer to a static method of
`Foo` or to an instance method (an "unbound method reference"), in which
case
the first argument becomes the receiver.
```
class Optional<T> {
static<T> Optional<T> of(T t) { ... }
static<T> pattern(T t) of(Optional<T> target) { ... }
static<T> Optional<T> empty() { ... }
static<T> pattern() empty(Optional<T> target) { ... }
}
```
This leads to the familiar symmetry at the use site: we construct `Optional`
instances with `Optional.empty()` and `Optional.of(t)`, and match
against them
like:
```
switch (opt) {
case Optional.of(var t): ...
case Optional.empty(): ...
}
```
#### Patterns with input arguments
Pattern matching fuse a test with zero or more conditional extractions,
which is
something we already do all the time. Regular expression parsing is an
example
of this combination -- when we match against a regular expression, we
get both a
boolean result (did it match), and, if it did, we can then extract the
"groups"
bounded by `(...)` in the regular expression. It would be nice if we could
represent a regular expression as an ordinary pattern match, rather
than using
an ad-hoc calling sequence, while still delegating to the same underlying
regular expression library. So how do we turn a regular expression into a
pattern?
We already have almost all the tools needed; a regular expression is like a
pattern which binds a `String...`, whose target is a `String`. We could, for
each regular expression, write a static pattern:
```
Pattern asbs = Pattern.compile("(a*)(b*)");
static pattern(String...) asbsMatch(String target) { ...
... use asbs to match target, and extract the groups ...
}
```
This works -- we could ask if a string matches the pattern
`asbsMatch()`, and
bind the groups if so (further refining with nested patterns if we
like.) But
it seems a little clunky to have to write this method for every regular
expression; we'd like to write it once, and parameterize it with the regular
expression we want. This means we need to be able to specify an additional
input argument beyond the match target. This might be declared:
```
static pattern(String... groups) regex(String target, String regex) {
... use regex library to match against regex, extract bindings ...
}
```
Here, the first parameter denotes the target of the match (as before); the
remaining parameters must be passed at the use site. This might be written:
```
String AS_AND_BS = "(a*)(b*)";
case regex(AS_AND_BS, String[] { var aString, var bString }):
```
where the first "parameter" is the input parameter, and the second is a
nested
pattern that matches the sole binding variable. This is easy enough for
compilers to work out, but not so great for humans; it is not always obvious
where the data ends and the patterns begin. (This gets worse if there are
syntactic constructs that could be either arguments or nested patterns,
such as
denoting constant patterns with literals.) So it is probably preferable to
syntactically reflect the two different sorts of arguments, such as:
```
case regex(AS_AND_BS)(String[] { var aString, var bString }):
```
If a pattern has two argument lists, the first is the input arguments,
and the
second receives the bindings. If the first list is empty, as it usually is,
it can be omitted. Since our regex pattern is a varargs pattern, we
could also
elide the `String[]` pattern scaffolding:
```
case regex(AS_AND_BS)(String aString, String bString):
```
We would likely want to enforce the requirement that expressions used as
input
arguments be _effectively final_, to eliminate any confusion about the
timing
of when they are evaluated.
#### Alternatives to Map::get
The `Map::get` method takes a key and returns the corresponding value,
if the
mapping is present in the map, or `null` if the mapping is not present.
This is
a problematic API choice for a number of reasons. First, if the map
supports
null values, it is impossible to distinguish a mapping to null from
mapping-not-present; if we try to distinguish between these cases with
`Map::containsKey`, we risk a possible race condition. And, as we
integrate
inline classes and primitives into the generic type system as we are in
Project Valhalla, for some types, `null` will not be available as a
sentinel.
If we had patterns in the language when we designed Collections, we
might have
reached for a pattern, which captures both the "is the mapping present", and
"if so, what is the key mapped to" in one go:
```
interface Map {
pattern(V value) withMapping(K key);
}
```
and clients would be required to confront this conditionality directly:
```
if (m instanceof Map.withMapping(k)(var v)) { ... use v ... }
```
## Composition
Just as constructors compose (a subclass constructor can delegate to a
superclass constructor, and constructors/factories can delegate to
constructors/factories for their components), patterns should compose as
well.
Let's gather some requirements for composition of pattern declarations.
```
class A {
private int a;
public A(int a) { this.a = a; }
public deconstructor(int a) { ... }
}
class B extends A {
private int b;
public B(int a, int b) { super(a); this.b = b; }
public pattern(int a, int b) {
// want to delegate to superclass deconstructor
}
}
```
The class `A` was designed for subclassing, and subclass constructors will
delegate to constructors in `A`. Since deconstruction is the dual of
construction, it stands to reason that the deconstruction pattern in `B` can
delegate to the deconstruction pattern in `A`, and then add whatever
B-specific
deconstruction logic is needed.
If we use composition instead of inheritance, we have a similar
requirement. In
this next example, class `D` has two members of type `C`, and `D`'s
constructor
delegates to the `B` constructor; we would want its deconstructor to be
able
to do the same thing.
```
class C {
private int c;
public C(int c) { this.c = c; }
public deconstructor(int c) { ... }
}
class D {
C c1;
C c2;
public D(int x, y) {
this.c1 = new C(x);
this.c2 = new C(y);
}
public deconstructor(int x, int y) {
// delegate to C pattern to extract x from c1 and y from c2
}
}
```
In these examples, we've composed deconstruction (total) patterns into
bigger
patterns, so we didn't have to worry about what happens when the inner
pattern
fails. But we also want to be able to compose partial patterns into partial
patterns. For example, suppose we have:
```
interface Shape {
// if component is visible, provide bounds
public pattern(Rect bound) visibleShape();
}
```
And suppose we have a class that represents the composition of two
shapes; we
would like it to implement the `visibleShape()` pattern by delegating to the
`visibleShape()` pattern for each of the shapes, fail if either of these
fail,
and if both succeed, return the bounding rectangle for both shapes:
```
class CompoundShape implements Shape {
Shape s1 = ...
Shape s2 = ...
public pattern(Rect r) visibleShape() {
...
if (s1 instanceof visibleShape(var r1)
&& s2 instanceof visibleShape(var r2)) {
// succeed with boundingRect(r1, r2)
}
else {
// fail
}
}
}
```
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20210122/bb4f2fd7/attachment-0001.htm>
More information about the amber-spec-experts
mailing list