<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">

<div class="">I like this topic and overall organization, BUT:</div>

<div class=""><br class="">

</div>

Here is my initial big-picture reaction to this document: it is much harder to read than necessary because of the chiasmus induced by use of the terms “constructor-deconstructor pair” and “embedding-projection” pair. Very close reading of the “Digression” section

 reveals that the _deconstructor_ is the embedding and the _constructor_ is the projection. Constantly having to mentally reverse this mismatched order is gratuitously taxing.

<div class=""><br class="">

</div>

<div class="">Please do one of two things: either change “constructor-deconstructor” to “deconstructor-constructor” everywhere, or change “embedding-projection” to “projection-embedding” everywhere.</div>

<div class=""><br class="">

</div>

<div class="">Another thing you could do in the “Digression” section is to avoid the function-composition notation, which induces a similar reversal. You don’t really make extensive use of the result anyway (such as by  applying such a composition as a function

 in a formula). My advice for a Java audience would be to use a pseudo-Java method notation:</div>

<div class=""><br class="">

</div>

<div class="">

<div class=""><span class="Apple-tab-span" style="white-space: pre;"></span>x.embed().project().equals(x)  should always be true</div>

</div>

<div class="">

<div class=""><span class="Apple-tab-span" style="white-space: pre;"></span>x.project().embed().approx(x) should be true if nothing bad happens during “project()”</div>

</div>

<div class=""><br class="">

</div>

<div class="">Alos, I’m not sure exactly what you mean by a “complete partial order”: Wikipedia notes that the term has three distinct meanings in common use, and is also sometimes confused with “compete lattice”. <a href="https://en.wikipedia.org/wiki/Complete_partial_order" class="">https://en.wikipedia.org/wiki/Complete_partial_order</a> Furthermore

 it is unclear how you indeed to use that partial order to define the notion of approximate equivalence (which notorioiusly has difficulties associated with lack of transitivity).</div>

<div class=""><br class="">

</div>

<div class="">Please remove these obstacles to readability and I will gladly take another look.  :-)</div>

<div class=""><br class="">

</div>

<div class="">—Guy</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

<div><br class="">

<blockquote type="cite" class="">

<div class="">On Mar 6, 2023, at 1:24 PM, Brian Goetz <<a href="mailto:brian.goetz@oracle.com" class="">brian.goetz@oracle.com</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div class="">Time to look ahead to the next installment of pattern matching: deconstruction patterns, which generalize record patterns.  This document does an end-to-end walkthrough (at a sketchy level of detail) through declaration, overloading, use, translation,

 and reflection of deconstruction patterns.<br class="">

<br class="">

I would like to *not* discuss syntax at this time.  There's a lengthy discussion to be had about syntax, and we'll have that, but let's nail down model, semantics, and translation first.<br class="">

<br class="">

As usual, I would prefer that people either (a) post a single reply addressing the totality of this sketch or (b) start _new threads_ if you want to discuss a specific aspect.  A quick "I'll just reply to this minor detail" seems to often derail the conversation

 in such a way that it never comes back.  If this all looks fine to you, a quick "no surprises here" will keep us from suspensefully waiting for feedback.<br class="">

<br class="">

<br class="">

# Deconstruction patterns -- translation, use, and reflection<br class="">

<br class="">

As we are wrapping up record patterns, it's time to look ahead to the next major<br class="">

part of the pattern matching story -- extending the capabilities of record<br class="">

patterns to all classes that want to support destructuring. Record patterns are<br class="">

simply a special case of _deconstruction patterns_ or _deconstructors_, where we<br class="">

derive the deconstructor API, implementation, and use from the state description<br class="">

of the record.  For an arbitrary class, a deconstruction patterns will require<br class="">

an explicit member declaration, with a header identifying the names and types of<br class="">

the bindings and a body that extracts the bindings from the representation.<br class="">

<br class="">

## Deconstructors<br class="">

<br class="">

Just as constructors are special cases of methods, deconstruction patterns are<br class="">

special cases of a more general notion of declared pattern, which also includes<br class="">

static matchers (the dual of static methods) and instance matchers (the dual of<br class="">

instance methods.)  Specifically, unlike the more general notion of matcher, a<br class="">

deconstructor must be _total_; it must always match.  This document will focus<br class="">

exclusively on deconstructors, and we'll come back to static and instance<br class="">

matchers in due time.  (But note that some of the design choices in the simple<br class="">

case of deconstructors may be constrained by the more general case.)<br class="">

<br class="">

There are a number of choices for how we might syntactically represent a<br class="">

deconstructor (or more generally, a declared pattern.)  For purposes of<br class="">

illustration, this document picks one possible syntactic expression of<br class="">

deconstructors, but it is premature to devolve into a syntax discussion at this<br class="">

time.<br class="">

<br class="">

```<br class="">

class Point {<br class="">

    final double x, y;<br class="">

<br class="">

    public Point(double x, double y) {<br class="">

        this.x = x;<br class="">

        this.y = y;<br class="">

    }<br class="">

<br class="">

    public matcher Point(double x, double y) {<br class="">

        x = this.x;<br class="">

        y = this.y;<br class="">

    }<br class="">

}<br class="">

```<br class="">

<br class="">

This example illustrates two aspects of the duality between constructors and<br class="">

their corresponding deconstructors.  Their APIs are duals: a constructor takes N<br class="">

parameters containing the desired description of the object state and produces a<br class="">

constructed object; a deconstructor starts from the constructed object and has N<br class="">

bindings (outputs) that receive the desired state components. Similarly, their<br class="">

implementations are duals: the body of the constructor initializes the object<br class="">

representation from the description, and the body of the deconstructor extracts<br class="">

the description from the representation.  A deconstructor is best understood as<br class="">

a _co-constructor_.<br class="">

<br class="">

The `Point` example above is special in two ways.  First, the internal<br class="">

representation of a `Point`, and the API of the constructor and deconstructor,<br class="">

are the same: `(double x, double y)`.  We can call the API implied by the<br class="">

constructor and deconstructor the _external representation_, and for `Point`,<br class="">

both the internal and external representations are the same. (This is one of<br class="">

the requirements for being a candidate to be a record.)  And second, the<br class="">

constructor is _total_; it does not reject any combinations of arguments.<br class="">

<br class="">

Here's another version of `Point` which does not have these special aspects; it<br class="">

uses the same internal representation as before, but chooses a pair of strings<br class="">

as the external representation:<br class="">

<br class="">

```<br class="">

class Point2 {<br class="">

    final double x, y;<br class="">

<br class="">

    public Point2(String x, String y) {<br class="">

        this.x = Double.parseDouble(x);<br class="">

        this.y = Double.parseDouble(y);<br class="">

    }<br class="">

<br class="">

    public matcher Point2(String x, String y) {<br class="">

        x = Double.toString(this.x);<br class="">

        y = Double.toSTring(this.y);<br class="">

    }<br class="">

}<br class="">

```<br class="">

<br class="">

The method `Double::parseDouble` will throw `NumberFormatException` if its<br class="">

argument does not describe a suitable value, so unlike the `Point` constructor,<br class="">

the `Point2` constructor is partial: it will reject `new Double("foo", "bar")`.<br class="">

And the internal representation is no longer the same as the external<br class="">

representation.  Less obviously, there are valid string values that we can<br class="">

provide to the constructor, but which cannot be represented exactly as `double`,<br class="">

and which will be approximated; the string value<br class="">

`"3.22222222222222222222222222222222222222"` will be approximated with the<br class="">

double value `3.2222222222222223`.<br class="">

<br class="">

This example highlights more clearly how the constructor and deconstructor form<br class="">

an _embedding-projection pair_ between the internal and external<br class="">

representations.  While some external representations might be invalid, and some<br class="">

might result in approximation, deconstruct-then-construct is always an identity<br class="">

transformation.  Indeed, the specification of `java.lang.Record` requires that<br class="">

if we deconstruct a record with its accessors, and pass the resulting values<br class="">

back to the constructor, we should get a new record that is `equals` to the<br class="">

original.<br class="">

<br class="">

The fact that constructor and deconstructor (and eventually, factory and static<br class="">

matcher) form an embedding-projection pair is why we are able to derive<br class="">

higher-level language features, such as [safer<br class="">

serialization](<a href="https://openjdk.org/projects/amber/design-notes/towards-better-serialization" class="">https://openjdk.org/projects/amber/design-notes/towards-better-serialization</a>)<br class="">

and [functional transformation of immutable<br class="">

objects](<a href="https://github.com/openjdk/amber-docs/blob/master/eg-drafts/reconstruction-records-and-classes.md" class="">https://github.com/openjdk/amber-docs/blob/master/eg-drafts/reconstruction-records-and-classes.md</a>),<br class="">

from a matched set of constructor and deconstructor.<br class="">

<br class="">

Of course, users are free to implement constructors without deconstructors, or<br class="">

constructors and deconstructors whose external representations don't match up,<br class="">

or even matching constructors and deconstructors that are not behaviorally dual.<br class="">

But providing a matched set (or several) of constructors and deconstructors<br class="">

enables reliably reversible aggregation, and allows us to mechanically derive<br class="">

useful higher-level features such as withers.<br class="">

<br class="">

#### Overloading<br class="">

<br class="">

Just as constructors can be overloaded, deconstructors can be overloaded for the<br class="">

same reason: multiple constructors can expose multiple external representations<br class="">

for aggregation, and corresponding deconstructors can recover those multiple<br class="">

external representations.  Any matching pair of constructor-deconstructor (and<br class="">

eventually, factory-deconstructor) is a candidate for use in higher-level<br class="">

features based on the embedding-projection nature of the<br class="">

constructor-deconstructor pair.<br class="">

<br class="">

Just as deconstruction is dual to construction, overloading of deconstructors is<br class="">

dual to that of constructors: rather than restricting which sets of parameters<br class="">

can be overloaded against each other, we do so with the bindings instead.  For<br class="">

constructors of a given arity, we require that their signatures not be<br class="">

override-equivalent; for deconstructors of a given arity, we require the same of<br class="">

their bindings.<br class="">

<br class="">

For a deconstructor (and declared patterns in general), we derive a _binding<br class="">

signature_ (and an erased _binding descriptor_) which treats the binding list as<br class="">

a parameter list.  The overload rule outlined above requires that binding<br class="">

signatures for two deconstructors of the same arity not be override-equivalent.<br class="">

(We will find it useful later to derive a `MethodType` for the binding<br class="">

descriptor; this is a `MethodType` whose return type is `V` and whose parameter<br class="">

types are the erased types of the bindings.)<br class="">

<br class="">

#### Digression: embedding-projection pairs<br class="">

<br class="">

Given two sets _A_ and _B_, a pair of functions `e : A -> B` and `p : B -> A`,<br class="">

forms an _embedding-projection pair_ if `p . e` (embed then project) is an<br class="">

identity function, and `e . p` (project then embed) _approximates_ the input<br class="">

according to a domain-specific approximation metric (which is a complete partial<br class="">

ordering on `B`.)<br class="">

<br class="">

When applied to constructor-deconstructor pairs, this says that deconstructing<br class="">

an object and then reconstructing it with the resulting bindings should result<br class="">

in an equivalent object, and constructing an object from an external<br class="">

representation and then deconstructing it back into that external representation<br class="">

should result in an approximation of the original external representation.  (A<br class="">

complete partial ordering models constructor failure as the non-terminating<br class="">

bottom value, which is considered an infinitely bad approximation to<br class="">

everything.)<br class="">

<br class="">

Embedding-projection pairs have a number of desirable properties, such as the<br class="">

composition of two e-p pairs is an e-p pair; this property is at the heart of<br class="">

using constructor-deconstructor pairs for improved serialization and functional<br class="">

transformation.<br class="">

<br class="">

## Invoking deconstructors<br class="">

<br class="">

We've already seen how to "invoke" deconstructors: through pattern matching.<br class="">

What we've been calling "record patterns" are merely deconstruction patterns<br class="">

derived mechanically from the state description, just as we do with constructors<br class="">

and accessors; there is little difference between record patterns and<br class="">

deconstruction patterns other than the ability to declare them explicitly.<br class="">

(There is an accidental difference in the translation, in that we currently<br class="">

implement record patterns by appealing to individual accessors rather than a<br class="">

single deconstructor, but this may eventually converge as well.)<br class="">

<br class="">

The use-site syntax of deconstruction bears a deliberate similarity to that of<br class="">

construction; `new Point(x, y)` is deconstructed by `case Point(var x, var y)`.<br class="">

<br class="">

#### Overload selection<br class="">

<br class="">

In the presence of overloaded deconstructors, we need to figure out which<br class="">

deconstructor a deconstruction pattern `C(P*)` is referring to. The details are<br class="">

similar to overload selection for methods, except that we operate on the<br class="">

bindings rather than the parameters.  We first search for _applicable matchers_,<br class="">

using increasingly loose criteria (first excluding boxing, unboxing, and<br class="">

varargs; then including boxing and unboxing but not varargs; and finally, all<br class="">

candidates) and then selecting the most applicable.<br class="">

<br class="">

It is tempting to try and bypass the three-phase selection process and use a<br class="">

simpler notion of applicability (perhaps noting that we got this process for<br class="">

compatibility with existing overload selection decisions when autoboxing and<br class="">

varargs were added, and that there are few deconstructor invocations to be<br class="">

compatible with yet.)  But because existing overloaded constructors use this<br class="">

mechanism, and there is significant value in pairing constructors and<br class="">

deconstructors, attempting to invent a simpler-but-different overload selection<br class="">

mechanism for deconstructors would inevitably undermine the duality between<br class="">

matching constructor-deconstructor pairs. So compatibility (this time, with<br class="">

existing overloaded constructors) once again forces our hand.<br class="">

<br class="">

The specification for overload selection is complicated significantly by poly<br class="">

expressions (e.g., lambdas); fortunately, there are no "poly patterns", and so,<br class="">

while the structure of JLS 15.12.2 is retained for overload selection of<br class="">

deconstruction patterns, much of the detail is left behind.<br class="">

<br class="">

## Translation<br class="">

<br class="">

We translate patterns into synthetic methods with a `Matcher` attribute; this<br class="">

method implements the matcher behavior.  The translation scheme derives from a<br class="">

number of requirements, only some of which are in play for deconstructors.<br class="">

<br class="">

The matcher method for a deconstructor is a final instance method that takes no<br class="">

parameters and returns `Object`, perhaps with a special name (just as<br class="">

constructors are called `<init>`.)<br class="">

<br class="">

#### Carriers<br class="">

<br class="">

Because the matcher methods implements the matcher behavior, but a matcher may<br class="">

"return" multiple bindings (or failure), we must encode the bindings in some<br class="">

way.  For this, we use a _carrier object_.  The choice of carrier is largely a<br class="">

footprint/specificity tradeoff.  One could imagine a carrier class per matcher,<br class="">

or a carrier class per matcher descriptor, or using `Object[]` as a carrier for<br class="">

everything, or caching some number of common shapes (e.g, three ints and two<br class="">

refs).  This sort of tuning should be separate from the protocol encoded in the<br class="">

bytecode of the pattern method and its clients.<br class="">

<br class="">

We use a small _carrier runtime_ to decouple pattern translation from carrier<br class="">

selection.  (This same carrier runtime is used by string templates as well.)<br class="">

This allows tradeoffs in runtime characteristics (e.g., carrier per matcher vs<br class="">

sharing carriers across matchers, dropping carrier identity with value types<br class="">

later, etc) without affecting the translation. The carrier API consists of condy<br class="">

bootstraps like:<br class="">

<br class="">

```<br class="">

static MethodHandle carrierFactory(MethodType matcherDescriptor) { ... }<br class="">

static MethodHandle carrierAccessor(MethodType matcherDescriptor, int bindingNo) { ... }<br class="">

```<br class="">

<br class="">

The `matcherDescriptor` is a `MethodType` describing the binding types.  The<br class="">

`carrierFactory` method returns a method handle which takes the bindings and<br class="">

produces a carrier object; the `carrierAccessor` method returns method handles<br class="">

that take the carrier object and return the corresponding binding.  To indicate<br class="">

success, the matcher method invokes the carrier factory method handle and<br class="">

returns the result; to indicate failure (deconstructors cannot fail, but other<br class="">

matchers can) the matcher method returns null.<br class="">

<br class="">

We would translate the XY deconstructor from `Point` as follows (pseudo-code):<br class="">

<br class="">

```<br class="">

#100: MethodType[(II)V]<br class="">

#101: Condy[bsm=Carriers::carrierFactory, args=[#100]]<br class="">

<br class="">

final synthetic Object Point$MANGLE() {<br class="">

    aload_0<br class="">

    getfield Point::x<br class="">

    aload_0<br class="">

    getfield Point::y<br class="">

    LDC #101<br class="">

    invokevirtual MethodHandle::invoke(II)V<br class="">

    areturn<br class="">

}<br class="">

```<br class="">

<br class="">

Constant `#100` contains a `MethodType` holding the binding descriptor; constant<br class="">

`#101` holds a method handle whose parameters are the parameter types of the<br class="">

binding descriptor and returns `Object`.<br class="">

<br class="">

At the use site, matching a deconstruction pattern is performed by invoking the<br class="">

matcher method on the appropriate target object, and then extracting the<br class="">

components with the carrier accessor method handles if the match is successful.<br class="">

(Deconstructors are total, so are always successful, but for other patterns,<br class="">

null is returned from the matcher method on failure to match.)<br class="">

<br class="">

#### Method names<br class="">

<br class="">

The name of the matcher method is mangled to support overloading. The JVM<br class="">

permits overloading on parameter types, but not return types (and overloaded<br class="">

matchers are effectively overloaded on return types.)  We take the approach of<br class="">

encoding the erasure of the matcher descriptor in the name of the pattern.  This<br class="">

has several desirable properties: it is stable (the name is derived solely from<br class="">

stable aspects of the declaration), for matchers with override-equivalent<br class="">

signatures (deconstructors can't be overridden, but other patterns can be),<br class="">

these map to true overrides in the translation, and valid overloads of matchers<br class="">

will always have distinct names.<br class="">

<br class="">

We use the ["Symbolic Freedom"]() encoding of the erasure of the matcher<br class="">

descriptor as the mangled disambiguator, which is exactly as stable as any other<br class="">

method descriptor derived from source declarations.<br class="">

<br class="">

#### Attributes<br class="">

<br class="">

Because patterns are methods, we can take advantage of all the affordances of<br class="">

methods.  We can use access bits to control accessibility; we can use the<br class="">

attributes that carry annotations, method parameter metadata, and generics<br class="">

signatures to carry information about the pattern declaration (and its (input)<br class="">

parameters, when we get to those kinds of matchers).  What's missing is the fact<br class="">

that this is a pattern implementation and not an ordinary method, and a place to<br class="">

put metadata for bindings.  To address the first, we can add the following<br class="">

attribute on matcher methods:<br class="">

<br class="">

    Matcher {<br class="">

        u2 name;                            // "Matcher"<br class="">

        u4 length;<br class="">

        u2 patternFlags;<br class="">

        u2 patternName;                     // UTF8<br class="">

        u2 patternDescr;                    // MethodType<br class="">

        u2 attributes_count;<br class="">

        attribute_info attributes[attributes_count];<br class="">

    }<br class="">

<br class="">

This says that "this method is a pattern".  The source name of the pattern<br class="">

declaration is reified as `patternName`, and the matcher descriptor, which<br class="">

encodes the types of the bindings, is reified as a `MethodType` in<br class="">

`patternDescr`.  The `flags` word can carry matcher-specific information such as<br class="">

"this matcher is a deconstructor" or "this matcher is total".<br class="">

<br class="">

A matcher method may have the usual variety of method attributes, such as<br class="">

`RuntimeInvisibleAnnotations` for annotations on the matcher declaration itself.<br class="">

<br class="">

If we wish to encode information about the matcher _bindings_, we do so with<br class="">

attributes inside the `Matcher` annotation itself.  Attributes such as<br class="">

`Signature`, `ParameterNames`, `RuntimeVisibleParameterAnnotations`, etc, can<br class="">

appear in a `Matcher` and are interpreted relative to the matcher signature or<br class="">

descriptor.  So if we had a matcher:<br class="">

<br class="">

```<br class="">

matcher Foo(@Bar List<String> list) { ... }<br class="">

```<br class="">

<br class="">

then the `Matcher` would contain the signature attribute corresponding to<br class="">

`(List<String>)` and a `RuntimeXxxParameterAnnotations` attribute describing the<br class="">

`@Bar` annotation on the first "parameter".<br class="">

<br class="">

#### Reflection<br class="">

<br class="">

Since matchers are a new kind of class member, they will need a new kind of<br class="">

reflective object, and a method that is analogous to `Class::getConstructors`.<br class="">

The reflective object should extend `Executable`, as all of the existing methods<br class="">

on `Executable` make sense for patterns (using `Object` as the return type.)  If<br class="">

the pattern is reflectively invoked, it returns `null` for no match, or an<br class="">

`Object[]` which is the boxing of the values in the carrier.<br class="">

<br class="">

We will then need some additional methods to describe the bindings, so the<br class="">

subtype of `Executable` has methods like `getBindings`, `getAnnotatedBindings`,<br class="">

`getGenericBindings`, `isDeconstructor`, `isPartial`, etc.  These methods will<br class="">

decode the `Matcher` attribute and its embedded attributes.<br class="">

<br class="">

## Summary<br class="">

<br class="">

This design borrows from previous rounds, but makes a number of simplifications.<br class="">

<br class="">

 - The bindings of a pattern are captured in a `MethodType`, called the _matcher<br class="">

   descriptor_.  The parameters of the matcher descriptor are the types of the<br class="">

   bindings; the return type is either `V` or the minimal type that will match<br class="">

   (but is not as important as the bindings.)<br class="">

 - Matchers are translated as methods whose names are derived deterministically<br class="">

   from the name of the matcher and the erasure of the pattern descriptor. These<br class="">

   are called _matcher methods_.  Matcher methods take as parameters the input<br class="">

   parameters of the pattern (if any), and return `Object`.<br class="">

 - The returned object is an opaque carrier.  Null means the pattern didn't<br class="">

   match.  A non-null value is the carrier type (from the carrier runtime) which<br class="">

   is derived from the pattern descriptor.<br class="">

 - Matcher methods are not directly invocable from the source language; they are<br class="">

   invoked indirectly through pattern matching or reflection.<br class="">

 - Generated code invokes the matcher method and interprets the returned value<br class="">

   according to the protocol, using MHs from the carrier runtime to access the<br class="">

   bindings.<br class="">

 - Matcher methods have a `Matcher` attribute, which captures information about<br class="">

   the matcher as a whole (is a total/partial, a deconstructor, etc) and<br class="">

   parameter-related attributes which describe the bindings.<br class="">

 - Matchers are reflected through a new subtype of `Executable`, which exposes<br class="">

   new methods to reflect over bindings.<br class="">

 - When invoking a matcher reflectively, the carrier is boxed to an Object[].<br class="">

<br class="">

<br class="">

<br class="">

</div>

</div>

</blockquote>

</div>

<br class="">

</div>

</body>

</html>