Deconstruction patterns

Tue Mar 7 22:42:02 UTC 2023

I like this topic and overall organization, BUT:

Here is my initial big-picture reaction to this document: it is much harder to read than necessary because of the chiasmus induced by use of the terms “constructor-deconstructor pair” and “embedding-projection” pair. Very close reading of the “Digression” section reveals that the _deconstructor_ is the embedding and the _constructor_ is the projection. Constantly having to mentally reverse this mismatched order is gratuitously taxing.

Please do one of two things: either change “constructor-deconstructor” to “deconstructor-constructor” everywhere, or change “embedding-projection” to “projection-embedding” everywhere.

Another thing you could do in the “Digression” section is to avoid the function-composition notation, which induces a similar reversal. You don’t really make extensive use of the result anyway (such as by  applying such a composition as a function in a formula). My advice for a Java audience would be to use a pseudo-Java method notation:

x.embed().project().equals(x)  should always be true
x.project().embed().approx(x) should be true if nothing bad happens during “project()”

Alos, I’m not sure exactly what you mean by a “complete partial order”: Wikipedia notes that the term has three distinct meanings in common use, and is also sometimes confused with “compete lattice”. https://en.wikipedia.org/wiki/Complete_partial_order Furthermore it is unclear how you indeed to use that partial order to define the notion of approximate equivalence (which notorioiusly has difficulties associated with lack of transitivity).

Please remove these obstacles to readability and I will gladly take another look.  :-)

—Guy

On Mar 6, 2023, at 1:24 PM, Brian Goetz <brian.goetz at oracle.com<mailto:brian.goetz at oracle.com>> wrote:

Time to look ahead to the next installment of pattern matching: deconstruction patterns, which generalize record patterns.  This document does an end-to-end walkthrough (at a sketchy level of detail) through declaration, overloading, use, translation, and reflection of deconstruction patterns.

I would like to *not* discuss syntax at this time.  There's a lengthy discussion to be had about syntax, and we'll have that, but let's nail down model, semantics, and translation first.

As usual, I would prefer that people either (a) post a single reply addressing the totality of this sketch or (b) start _new threads_ if you want to discuss a specific aspect.  A quick "I'll just reply to this minor detail" seems to often derail the conversation in such a way that it never comes back.  If this all looks fine to you, a quick "no surprises here" will keep us from suspensefully waiting for feedback.

# Deconstruction patterns -- translation, use, and reflection

As we are wrapping up record patterns, it's time to look ahead to the next major
part of the pattern matching story -- extending the capabilities of record
patterns to all classes that want to support destructuring. Record patterns are
simply a special case of _deconstruction patterns_ or _deconstructors_, where we
derive the deconstructor API, implementation, and use from the state description
of the record.  For an arbitrary class, a deconstruction patterns will require
an explicit member declaration, with a header identifying the names and types of
the bindings and a body that extracts the bindings from the representation.

## Deconstructors

Just as constructors are special cases of methods, deconstruction patterns are
special cases of a more general notion of declared pattern, which also includes
static matchers (the dual of static methods) and instance matchers (the dual of
instance methods.)  Specifically, unlike the more general notion of matcher, a
deconstructor must be _total_; it must always match.  This document will focus
exclusively on deconstructors, and we'll come back to static and instance
matchers in due time.  (But note that some of the design choices in the simple
case of deconstructors may be constrained by the more general case.)

There are a number of choices for how we might syntactically represent a
deconstructor (or more generally, a declared pattern.)  For purposes of
illustration, this document picks one possible syntactic expression of
deconstructors, but it is premature to devolve into a syntax discussion at this
time.

```
class Point {
    final double x, y;

    public Point(double x, double y) {
        this.x = x;
        this.y = y;
    }

    public matcher Point(double x, double y) {
        x = this.x;
        y = this.y;
    }
}
```

This example illustrates two aspects of the duality between constructors and
their corresponding deconstructors.  Their APIs are duals: a constructor takes N
parameters containing the desired description of the object state and produces a
constructed object; a deconstructor starts from the constructed object and has N
bindings (outputs) that receive the desired state components. Similarly, their
implementations are duals: the body of the constructor initializes the object
representation from the description, and the body of the deconstructor extracts
the description from the representation.  A deconstructor is best understood as
a _co-constructor_.

The `Point` example above is special in two ways.  First, the internal
representation of a `Point`, and the API of the constructor and deconstructor,
are the same: `(double x, double y)`.  We can call the API implied by the
constructor and deconstructor the _external representation_, and for `Point`,
both the internal and external representations are the same. (This is one of
the requirements for being a candidate to be a record.)  And second, the
constructor is _total_; it does not reject any combinations of arguments.

Here's another version of `Point` which does not have these special aspects; it
uses the same internal representation as before, but chooses a pair of strings
as the external representation:

```
class Point2 {
    final double x, y;

    public Point2(String x, String y) {
        this.x = Double.parseDouble(x);
        this.y = Double.parseDouble(y);
    }

    public matcher Point2(String x, String y) {
        x = Double.toString(this.x);
        y = Double.toSTring(this.y);
    }
}
```

The method `Double::parseDouble` will throw `NumberFormatException` if its
argument does not describe a suitable value, so unlike the `Point` constructor,
the `Point2` constructor is partial: it will reject `new Double("foo", "bar")`.
And the internal representation is no longer the same as the external
representation.  Less obviously, there are valid string values that we can
provide to the constructor, but which cannot be represented exactly as `double`,
and which will be approximated; the string value
`"3.22222222222222222222222222222222222222"` will be approximated with the
double value `3.2222222222222223`.

This example highlights more clearly how the constructor and deconstructor form
an _embedding-projection pair_ between the internal and external
representations.  While some external representations might be invalid, and some
might result in approximation, deconstruct-then-construct is always an identity
transformation.  Indeed, the specification of `java.lang.Record` requires that
if we deconstruct a record with its accessors, and pass the resulting values
back to the constructor, we should get a new record that is `equals` to the
original.

The fact that constructor and deconstructor (and eventually, factory and static
matcher) form an embedding-projection pair is why we are able to derive
higher-level language features, such as [safer
serialization](https://openjdk.org/projects/amber/design-notes/towards-better-serialization)
and [functional transformation of immutable
objects](https://github.com/openjdk/amber-docs/blob/master/eg-drafts/reconstruction-records-and-classes.md),
from a matched set of constructor and deconstructor.

Of course, users are free to implement constructors without deconstructors, or
constructors and deconstructors whose external representations don't match up,
or even matching constructors and deconstructors that are not behaviorally dual.
But providing a matched set (or several) of constructors and deconstructors
enables reliably reversible aggregation, and allows us to mechanically derive
useful higher-level features such as withers.

#### Overloading

Just as constructors can be overloaded, deconstructors can be overloaded for the
same reason: multiple constructors can expose multiple external representations
for aggregation, and corresponding deconstructors can recover those multiple
external representations.  Any matching pair of constructor-deconstructor (and
eventually, factory-deconstructor) is a candidate for use in higher-level
features based on the embedding-projection nature of the
constructor-deconstructor pair.

Just as deconstruction is dual to construction, overloading of deconstructors is
dual to that of constructors: rather than restricting which sets of parameters
can be overloaded against each other, we do so with the bindings instead.  For
constructors of a given arity, we require that their signatures not be
override-equivalent; for deconstructors of a given arity, we require the same of
their bindings.

For a deconstructor (and declared patterns in general), we derive a _binding
signature_ (and an erased _binding descriptor_) which treats the binding list as
a parameter list.  The overload rule outlined above requires that binding
signatures for two deconstructors of the same arity not be override-equivalent.
(We will find it useful later to derive a `MethodType` for the binding
descriptor; this is a `MethodType` whose return type is `V` and whose parameter
types are the erased types of the bindings.)

#### Digression: embedding-projection pairs

Given two sets _A_ and _B_, a pair of functions `e : A -> B` and `p : B -> A`,
forms an _embedding-projection pair_ if `p . e` (embed then project) is an
identity function, and `e . p` (project then embed) _approximates_ the input
according to a domain-specific approximation metric (which is a complete partial
ordering on `B`.)

When applied to constructor-deconstructor pairs, this says that deconstructing
an object and then reconstructing it with the resulting bindings should result
in an equivalent object, and constructing an object from an external
representation and then deconstructing it back into that external representation
should result in an approximation of the original external representation.  (A
complete partial ordering models constructor failure as the non-terminating
bottom value, which is considered an infinitely bad approximation to
everything.)

Embedding-projection pairs have a number of desirable properties, such as the
composition of two e-p pairs is an e-p pair; this property is at the heart of
using constructor-deconstructor pairs for improved serialization and functional
transformation.

## Invoking deconstructors

We've already seen how to "invoke" deconstructors: through pattern matching.
What we've been calling "record patterns" are merely deconstruction patterns
derived mechanically from the state description, just as we do with constructors
and accessors; there is little difference between record patterns and
deconstruction patterns other than the ability to declare them explicitly.
(There is an accidental difference in the translation, in that we currently
implement record patterns by appealing to individual accessors rather than a
single deconstructor, but this may eventually converge as well.)

The use-site syntax of deconstruction bears a deliberate similarity to that of
construction; `new Point(x, y)` is deconstructed by `case Point(var x, var y)`.

#### Overload selection

In the presence of overloaded deconstructors, we need to figure out which
deconstructor a deconstruction pattern `C(P*)` is referring to. The details are
similar to overload selection for methods, except that we operate on the
bindings rather than the parameters.  We first search for _applicable matchers_,
using increasingly loose criteria (first excluding boxing, unboxing, and
varargs; then including boxing and unboxing but not varargs; and finally, all
candidates) and then selecting the most applicable.

It is tempting to try and bypass the three-phase selection process and use a
simpler notion of applicability (perhaps noting that we got this process for
compatibility with existing overload selection decisions when autoboxing and
varargs were added, and that there are few deconstructor invocations to be
compatible with yet.)  But because existing overloaded constructors use this
mechanism, and there is significant value in pairing constructors and
deconstructors, attempting to invent a simpler-but-different overload selection
mechanism for deconstructors would inevitably undermine the duality between
matching constructor-deconstructor pairs. So compatibility (this time, with
existing overloaded constructors) once again forces our hand.

The specification for overload selection is complicated significantly by poly
expressions (e.g., lambdas); fortunately, there are no "poly patterns", and so,
while the structure of JLS 15.12.2 is retained for overload selection of
deconstruction patterns, much of the detail is left behind.

## Translation

We translate patterns into synthetic methods with a `Matcher` attribute; this
method implements the matcher behavior.  The translation scheme derives from a
number of requirements, only some of which are in play for deconstructors.

The matcher method for a deconstructor is a final instance method that takes no
parameters and returns `Object`, perhaps with a special name (just as
constructors are called `<init>`.)

#### Carriers

Because the matcher methods implements the matcher behavior, but a matcher may
"return" multiple bindings (or failure), we must encode the bindings in some
way.  For this, we use a _carrier object_.  The choice of carrier is largely a
footprint/specificity tradeoff.  One could imagine a carrier class per matcher,
or a carrier class per matcher descriptor, or using `Object[]` as a carrier for
everything, or caching some number of common shapes (e.g, three ints and two
refs).  This sort of tuning should be separate from the protocol encoded in the
bytecode of the pattern method and its clients.

We use a small _carrier runtime_ to decouple pattern translation from carrier
selection.  (This same carrier runtime is used by string templates as well.)
This allows tradeoffs in runtime characteristics (e.g., carrier per matcher vs
sharing carriers across matchers, dropping carrier identity with value types
later, etc) without affecting the translation. The carrier API consists of condy
bootstraps like:

```
static MethodHandle carrierFactory(MethodType matcherDescriptor) { ... }
static MethodHandle carrierAccessor(MethodType matcherDescriptor, int bindingNo) { ... }
```

The `matcherDescriptor` is a `MethodType` describing the binding types.  The
`carrierFactory` method returns a method handle which takes the bindings and
produces a carrier object; the `carrierAccessor` method returns method handles
that take the carrier object and return the corresponding binding.  To indicate
success, the matcher method invokes the carrier factory method handle and
returns the result; to indicate failure (deconstructors cannot fail, but other
matchers can) the matcher method returns null.

We would translate the XY deconstructor from `Point` as follows (pseudo-code):

```
#100: MethodType[(II)V]
#101: Condy[bsm=Carriers::carrierFactory, args=[#100]]

final synthetic Object Point$MANGLE() {
    aload_0
    getfield Point::x
    aload_0
    getfield Point::y
    LDC #101
    invokevirtual MethodHandle::invoke(II)V
    areturn
}
```

Constant `#100` contains a `MethodType` holding the binding descriptor; constant
`#101` holds a method handle whose parameters are the parameter types of the
binding descriptor and returns `Object`.

At the use site, matching a deconstruction pattern is performed by invoking the
matcher method on the appropriate target object, and then extracting the
components with the carrier accessor method handles if the match is successful.
(Deconstructors are total, so are always successful, but for other patterns,
null is returned from the matcher method on failure to match.)

#### Method names

The name of the matcher method is mangled to support overloading. The JVM
permits overloading on parameter types, but not return types (and overloaded
matchers are effectively overloaded on return types.)  We take the approach of
encoding the erasure of the matcher descriptor in the name of the pattern.  This
has several desirable properties: it is stable (the name is derived solely from
stable aspects of the declaration), for matchers with override-equivalent
signatures (deconstructors can't be overridden, but other patterns can be),
these map to true overrides in the translation, and valid overloads of matchers
will always have distinct names.

We use the ["Symbolic Freedom"]() encoding of the erasure of the matcher
descriptor as the mangled disambiguator, which is exactly as stable as any other
method descriptor derived from source declarations.

#### Attributes

Because patterns are methods, we can take advantage of all the affordances of
methods.  We can use access bits to control accessibility; we can use the
attributes that carry annotations, method parameter metadata, and generics
signatures to carry information about the pattern declaration (and its (input)
parameters, when we get to those kinds of matchers).  What's missing is the fact
that this is a pattern implementation and not an ordinary method, and a place to
put metadata for bindings.  To address the first, we can add the following
attribute on matcher methods:

    Matcher {
        u2 name;                            // "Matcher"
        u4 length;
        u2 patternFlags;
        u2 patternName;                     // UTF8
        u2 patternDescr;                    // MethodType
        u2 attributes_count;
        attribute_info attributes[attributes_count];
    }

This says that "this method is a pattern".  The source name of the pattern
declaration is reified as `patternName`, and the matcher descriptor, which
encodes the types of the bindings, is reified as a `MethodType` in
`patternDescr`.  The `flags` word can carry matcher-specific information such as
"this matcher is a deconstructor" or "this matcher is total".

A matcher method may have the usual variety of method attributes, such as
`RuntimeInvisibleAnnotations` for annotations on the matcher declaration itself.

If we wish to encode information about the matcher _bindings_, we do so with
attributes inside the `Matcher` annotation itself.  Attributes such as
`Signature`, `ParameterNames`, `RuntimeVisibleParameterAnnotations`, etc, can
appear in a `Matcher` and are interpreted relative to the matcher signature or
descriptor.  So if we had a matcher:

```
matcher Foo(@Bar List<String> list) { ... }
```

then the `Matcher` would contain the signature attribute corresponding to
`(List<String>)` and a `RuntimeXxxParameterAnnotations` attribute describing the
`@Bar` annotation on the first "parameter".

#### Reflection

Since matchers are a new kind of class member, they will need a new kind of
reflective object, and a method that is analogous to `Class::getConstructors`.
The reflective object should extend `Executable`, as all of the existing methods
on `Executable` make sense for patterns (using `Object` as the return type.)  If
the pattern is reflectively invoked, it returns `null` for no match, or an
`Object[]` which is the boxing of the values in the carrier.

We will then need some additional methods to describe the bindings, so the
subtype of `Executable` has methods like `getBindings`, `getAnnotatedBindings`,
`getGenericBindings`, `isDeconstructor`, `isPartial`, etc.  These methods will
decode the `Matcher` attribute and its embedded attributes.

## Summary

This design borrows from previous rounds, but makes a number of simplifications.

 - The bindings of a pattern are captured in a `MethodType`, called the _matcher
   descriptor_.  The parameters of the matcher descriptor are the types of the
   bindings; the return type is either `V` or the minimal type that will match
   (but is not as important as the bindings.)
 - Matchers are translated as methods whose names are derived deterministically
   from the name of the matcher and the erasure of the pattern descriptor. These
   are called _matcher methods_.  Matcher methods take as parameters the input
   parameters of the pattern (if any), and return `Object`.
 - The returned object is an opaque carrier.  Null means the pattern didn't
   match.  A non-null value is the carrier type (from the carrier runtime) which
   is derived from the pattern descriptor.
 - Matcher methods are not directly invocable from the source language; they are
   invoked indirectly through pattern matching or reflection.
 - Generated code invokes the matcher method and interprets the returned value
   according to the protocol, using MHs from the carrier runtime to access the
   bindings.
 - Matcher methods have a `Matcher` attribute, which captures information about
   the matcher as a whole (is a total/partial, a deconstructor, etc) and
   parameter-related attributes which describe the bindings.
 - Matchers are reflected through a new subtype of `Executable`, which exposes
   new methods to reflect over bindings.
 - When invoking a matcher reflectively, the carrier is boxed to an Object[].

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-experts/attachments/20230307/f0cc17d3/attachment-0001.htm>