<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div class="">I like this topic and overall organization, BUT:</div>
<div class=""><br class="">
</div>
Here is my initial big-picture reaction to this document: it is much harder to read than necessary because of the chiasmus induced by use of the terms “constructor-deconstructor pair” and “embedding-projection” pair. Very close reading of the “Digression” section
reveals that the _deconstructor_ is the embedding and the _constructor_ is the projection. Constantly having to mentally reverse this mismatched order is gratuitously taxing.
<div class=""><br class="">
</div>
<div class="">Please do one of two things: either change “constructor-deconstructor” to “deconstructor-constructor” everywhere, or change “embedding-projection” to “projection-embedding” everywhere.</div>
<div class=""><br class="">
</div>
<div class="">Another thing you could do in the “Digression” section is to avoid the function-composition notation, which induces a similar reversal. You don’t really make extensive use of the result anyway (such as by applying such a composition as a function
in a formula). My advice for a Java audience would be to use a pseudo-Java method notation:</div>
<div class=""><br class="">
</div>
<div class="">
<div class=""><span class="Apple-tab-span" style="white-space: pre;"></span>x.embed().project().equals(x) should always be true</div>
</div>
<div class="">
<div class=""><span class="Apple-tab-span" style="white-space: pre;"></span>x.project().embed().approx(x) should be true if nothing bad happens during “project()”</div>
</div>
<div class=""><br class="">
</div>
<div class="">Alos, I’m not sure exactly what you mean by a “complete partial order”: Wikipedia notes that the term has three distinct meanings in common use, and is also sometimes confused with “compete lattice”. <a href="https://en.wikipedia.org/wiki/Complete_partial_order" class="">https://en.wikipedia.org/wiki/Complete_partial_order</a> Furthermore
it is unclear how you indeed to use that partial order to define the notion of approximate equivalence (which notorioiusly has difficulties associated with lack of transitivity).</div>
<div class=""><br class="">
</div>
<div class="">Please remove these obstacles to readability and I will gladly take another look. :-)</div>
<div class=""><br class="">
</div>
<div class="">—Guy</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Mar 6, 2023, at 1:24 PM, Brian Goetz <<a href="mailto:brian.goetz@oracle.com" class="">brian.goetz@oracle.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">Time to look ahead to the next installment of pattern matching: deconstruction patterns, which generalize record patterns. This document does an end-to-end walkthrough (at a sketchy level of detail) through declaration, overloading, use, translation,
and reflection of deconstruction patterns.<br class="">
<br class="">
I would like to *not* discuss syntax at this time. There's a lengthy discussion to be had about syntax, and we'll have that, but let's nail down model, semantics, and translation first.<br class="">
<br class="">
As usual, I would prefer that people either (a) post a single reply addressing the totality of this sketch or (b) start _new threads_ if you want to discuss a specific aspect. A quick "I'll just reply to this minor detail" seems to often derail the conversation
in such a way that it never comes back. If this all looks fine to you, a quick "no surprises here" will keep us from suspensefully waiting for feedback.<br class="">
<br class="">
<br class="">
# Deconstruction patterns -- translation, use, and reflection<br class="">
<br class="">
As we are wrapping up record patterns, it's time to look ahead to the next major<br class="">
part of the pattern matching story -- extending the capabilities of record<br class="">
patterns to all classes that want to support destructuring. Record patterns are<br class="">
simply a special case of _deconstruction patterns_ or _deconstructors_, where we<br class="">
derive the deconstructor API, implementation, and use from the state description<br class="">
of the record. For an arbitrary class, a deconstruction patterns will require<br class="">
an explicit member declaration, with a header identifying the names and types of<br class="">
the bindings and a body that extracts the bindings from the representation.<br class="">
<br class="">
## Deconstructors<br class="">
<br class="">
Just as constructors are special cases of methods, deconstruction patterns are<br class="">
special cases of a more general notion of declared pattern, which also includes<br class="">
static matchers (the dual of static methods) and instance matchers (the dual of<br class="">
instance methods.) Specifically, unlike the more general notion of matcher, a<br class="">
deconstructor must be _total_; it must always match. This document will focus<br class="">
exclusively on deconstructors, and we'll come back to static and instance<br class="">
matchers in due time. (But note that some of the design choices in the simple<br class="">
case of deconstructors may be constrained by the more general case.)<br class="">
<br class="">
There are a number of choices for how we might syntactically represent a<br class="">
deconstructor (or more generally, a declared pattern.) For purposes of<br class="">
illustration, this document picks one possible syntactic expression of<br class="">
deconstructors, but it is premature to devolve into a syntax discussion at this<br class="">
time.<br class="">
<br class="">
```<br class="">
class Point {<br class="">
final double x, y;<br class="">
<br class="">
public Point(double x, double y) {<br class="">
this.x = x;<br class="">
this.y = y;<br class="">
}<br class="">
<br class="">
public matcher Point(double x, double y) {<br class="">
x = this.x;<br class="">
y = this.y;<br class="">
}<br class="">
}<br class="">
```<br class="">
<br class="">
This example illustrates two aspects of the duality between constructors and<br class="">
their corresponding deconstructors. Their APIs are duals: a constructor takes N<br class="">
parameters containing the desired description of the object state and produces a<br class="">
constructed object; a deconstructor starts from the constructed object and has N<br class="">
bindings (outputs) that receive the desired state components. Similarly, their<br class="">
implementations are duals: the body of the constructor initializes the object<br class="">
representation from the description, and the body of the deconstructor extracts<br class="">
the description from the representation. A deconstructor is best understood as<br class="">
a _co-constructor_.<br class="">
<br class="">
The `Point` example above is special in two ways. First, the internal<br class="">
representation of a `Point`, and the API of the constructor and deconstructor,<br class="">
are the same: `(double x, double y)`. We can call the API implied by the<br class="">
constructor and deconstructor the _external representation_, and for `Point`,<br class="">
both the internal and external representations are the same. (This is one of<br class="">
the requirements for being a candidate to be a record.) And second, the<br class="">
constructor is _total_; it does not reject any combinations of arguments.<br class="">
<br class="">
Here's another version of `Point` which does not have these special aspects; it<br class="">
uses the same internal representation as before, but chooses a pair of strings<br class="">
as the external representation:<br class="">
<br class="">
```<br class="">
class Point2 {<br class="">
final double x, y;<br class="">
<br class="">
public Point2(String x, String y) {<br class="">
this.x = Double.parseDouble(x);<br class="">
this.y = Double.parseDouble(y);<br class="">
}<br class="">
<br class="">
public matcher Point2(String x, String y) {<br class="">
x = Double.toString(this.x);<br class="">
y = Double.toSTring(this.y);<br class="">
}<br class="">
}<br class="">
```<br class="">
<br class="">
The method `Double::parseDouble` will throw `NumberFormatException` if its<br class="">
argument does not describe a suitable value, so unlike the `Point` constructor,<br class="">
the `Point2` constructor is partial: it will reject `new Double("foo", "bar")`.<br class="">
And the internal representation is no longer the same as the external<br class="">
representation. Less obviously, there are valid string values that we can<br class="">
provide to the constructor, but which cannot be represented exactly as `double`,<br class="">
and which will be approximated; the string value<br class="">
`"3.22222222222222222222222222222222222222"` will be approximated with the<br class="">
double value `3.2222222222222223`.<br class="">
<br class="">
This example highlights more clearly how the constructor and deconstructor form<br class="">
an _embedding-projection pair_ between the internal and external<br class="">
representations. While some external representations might be invalid, and some<br class="">
might result in approximation, deconstruct-then-construct is always an identity<br class="">
transformation. Indeed, the specification of `java.lang.Record` requires that<br class="">
if we deconstruct a record with its accessors, and pass the resulting values<br class="">
back to the constructor, we should get a new record that is `equals` to the<br class="">
original.<br class="">
<br class="">
The fact that constructor and deconstructor (and eventually, factory and static<br class="">
matcher) form an embedding-projection pair is why we are able to derive<br class="">
higher-level language features, such as [safer<br class="">
serialization](<a href="https://openjdk.org/projects/amber/design-notes/towards-better-serialization" class="">https://openjdk.org/projects/amber/design-notes/towards-better-serialization</a>)<br class="">
and [functional transformation of immutable<br class="">
objects](<a href="https://github.com/openjdk/amber-docs/blob/master/eg-drafts/reconstruction-records-and-classes.md" class="">https://github.com/openjdk/amber-docs/blob/master/eg-drafts/reconstruction-records-and-classes.md</a>),<br class="">
from a matched set of constructor and deconstructor.<br class="">
<br class="">
Of course, users are free to implement constructors without deconstructors, or<br class="">
constructors and deconstructors whose external representations don't match up,<br class="">
or even matching constructors and deconstructors that are not behaviorally dual.<br class="">
But providing a matched set (or several) of constructors and deconstructors<br class="">
enables reliably reversible aggregation, and allows us to mechanically derive<br class="">
useful higher-level features such as withers.<br class="">
<br class="">
#### Overloading<br class="">
<br class="">
Just as constructors can be overloaded, deconstructors can be overloaded for the<br class="">
same reason: multiple constructors can expose multiple external representations<br class="">
for aggregation, and corresponding deconstructors can recover those multiple<br class="">
external representations. Any matching pair of constructor-deconstructor (and<br class="">
eventually, factory-deconstructor) is a candidate for use in higher-level<br class="">
features based on the embedding-projection nature of the<br class="">
constructor-deconstructor pair.<br class="">
<br class="">
Just as deconstruction is dual to construction, overloading of deconstructors is<br class="">
dual to that of constructors: rather than restricting which sets of parameters<br class="">
can be overloaded against each other, we do so with the bindings instead. For<br class="">
constructors of a given arity, we require that their signatures not be<br class="">
override-equivalent; for deconstructors of a given arity, we require the same of<br class="">
their bindings.<br class="">
<br class="">
For a deconstructor (and declared patterns in general), we derive a _binding<br class="">
signature_ (and an erased _binding descriptor_) which treats the binding list as<br class="">
a parameter list. The overload rule outlined above requires that binding<br class="">
signatures for two deconstructors of the same arity not be override-equivalent.<br class="">
(We will find it useful later to derive a `MethodType` for the binding<br class="">
descriptor; this is a `MethodType` whose return type is `V` and whose parameter<br class="">
types are the erased types of the bindings.)<br class="">
<br class="">
#### Digression: embedding-projection pairs<br class="">
<br class="">
Given two sets _A_ and _B_, a pair of functions `e : A -> B` and `p : B -> A`,<br class="">
forms an _embedding-projection pair_ if `p . e` (embed then project) is an<br class="">
identity function, and `e . p` (project then embed) _approximates_ the input<br class="">
according to a domain-specific approximation metric (which is a complete partial<br class="">
ordering on `B`.)<br class="">
<br class="">
When applied to constructor-deconstructor pairs, this says that deconstructing<br class="">
an object and then reconstructing it with the resulting bindings should result<br class="">
in an equivalent object, and constructing an object from an external<br class="">
representation and then deconstructing it back into that external representation<br class="">
should result in an approximation of the original external representation. (A<br class="">
complete partial ordering models constructor failure as the non-terminating<br class="">
bottom value, which is considered an infinitely bad approximation to<br class="">
everything.)<br class="">
<br class="">
Embedding-projection pairs have a number of desirable properties, such as the<br class="">
composition of two e-p pairs is an e-p pair; this property is at the heart of<br class="">
using constructor-deconstructor pairs for improved serialization and functional<br class="">
transformation.<br class="">
<br class="">
## Invoking deconstructors<br class="">
<br class="">
We've already seen how to "invoke" deconstructors: through pattern matching.<br class="">
What we've been calling "record patterns" are merely deconstruction patterns<br class="">
derived mechanically from the state description, just as we do with constructors<br class="">
and accessors; there is little difference between record patterns and<br class="">
deconstruction patterns other than the ability to declare them explicitly.<br class="">
(There is an accidental difference in the translation, in that we currently<br class="">
implement record patterns by appealing to individual accessors rather than a<br class="">
single deconstructor, but this may eventually converge as well.)<br class="">
<br class="">
The use-site syntax of deconstruction bears a deliberate similarity to that of<br class="">
construction; `new Point(x, y)` is deconstructed by `case Point(var x, var y)`.<br class="">
<br class="">
#### Overload selection<br class="">
<br class="">
In the presence of overloaded deconstructors, we need to figure out which<br class="">
deconstructor a deconstruction pattern `C(P*)` is referring to. The details are<br class="">
similar to overload selection for methods, except that we operate on the<br class="">
bindings rather than the parameters. We first search for _applicable matchers_,<br class="">
using increasingly loose criteria (first excluding boxing, unboxing, and<br class="">
varargs; then including boxing and unboxing but not varargs; and finally, all<br class="">
candidates) and then selecting the most applicable.<br class="">
<br class="">
It is tempting to try and bypass the three-phase selection process and use a<br class="">
simpler notion of applicability (perhaps noting that we got this process for<br class="">
compatibility with existing overload selection decisions when autoboxing and<br class="">
varargs were added, and that there are few deconstructor invocations to be<br class="">
compatible with yet.) But because existing overloaded constructors use this<br class="">
mechanism, and there is significant value in pairing constructors and<br class="">
deconstructors, attempting to invent a simpler-but-different overload selection<br class="">
mechanism for deconstructors would inevitably undermine the duality between<br class="">
matching constructor-deconstructor pairs. So compatibility (this time, with<br class="">
existing overloaded constructors) once again forces our hand.<br class="">
<br class="">
The specification for overload selection is complicated significantly by poly<br class="">
expressions (e.g., lambdas); fortunately, there are no "poly patterns", and so,<br class="">
while the structure of JLS 15.12.2 is retained for overload selection of<br class="">
deconstruction patterns, much of the detail is left behind.<br class="">
<br class="">
## Translation<br class="">
<br class="">
We translate patterns into synthetic methods with a `Matcher` attribute; this<br class="">
method implements the matcher behavior. The translation scheme derives from a<br class="">
number of requirements, only some of which are in play for deconstructors.<br class="">
<br class="">
The matcher method for a deconstructor is a final instance method that takes no<br class="">
parameters and returns `Object`, perhaps with a special name (just as<br class="">
constructors are called `<init>`.)<br class="">
<br class="">
#### Carriers<br class="">
<br class="">
Because the matcher methods implements the matcher behavior, but a matcher may<br class="">
"return" multiple bindings (or failure), we must encode the bindings in some<br class="">
way. For this, we use a _carrier object_. The choice of carrier is largely a<br class="">
footprint/specificity tradeoff. One could imagine a carrier class per matcher,<br class="">
or a carrier class per matcher descriptor, or using `Object[]` as a carrier for<br class="">
everything, or caching some number of common shapes (e.g, three ints and two<br class="">
refs). This sort of tuning should be separate from the protocol encoded in the<br class="">
bytecode of the pattern method and its clients.<br class="">
<br class="">
We use a small _carrier runtime_ to decouple pattern translation from carrier<br class="">
selection. (This same carrier runtime is used by string templates as well.)<br class="">
This allows tradeoffs in runtime characteristics (e.g., carrier per matcher vs<br class="">
sharing carriers across matchers, dropping carrier identity with value types<br class="">
later, etc) without affecting the translation. The carrier API consists of condy<br class="">
bootstraps like:<br class="">
<br class="">
```<br class="">
static MethodHandle carrierFactory(MethodType matcherDescriptor) { ... }<br class="">
static MethodHandle carrierAccessor(MethodType matcherDescriptor, int bindingNo) { ... }<br class="">
```<br class="">
<br class="">
The `matcherDescriptor` is a `MethodType` describing the binding types. The<br class="">
`carrierFactory` method returns a method handle which takes the bindings and<br class="">
produces a carrier object; the `carrierAccessor` method returns method handles<br class="">
that take the carrier object and return the corresponding binding. To indicate<br class="">
success, the matcher method invokes the carrier factory method handle and<br class="">
returns the result; to indicate failure (deconstructors cannot fail, but other<br class="">
matchers can) the matcher method returns null.<br class="">
<br class="">
We would translate the XY deconstructor from `Point` as follows (pseudo-code):<br class="">
<br class="">
```<br class="">
#100: MethodType[(II)V]<br class="">
#101: Condy[bsm=Carriers::carrierFactory, args=[#100]]<br class="">
<br class="">
final synthetic Object Point$MANGLE() {<br class="">
aload_0<br class="">
getfield Point::x<br class="">
aload_0<br class="">
getfield Point::y<br class="">
LDC #101<br class="">
invokevirtual MethodHandle::invoke(II)V<br class="">
areturn<br class="">
}<br class="">
```<br class="">
<br class="">
Constant `#100` contains a `MethodType` holding the binding descriptor; constant<br class="">
`#101` holds a method handle whose parameters are the parameter types of the<br class="">
binding descriptor and returns `Object`.<br class="">
<br class="">
At the use site, matching a deconstruction pattern is performed by invoking the<br class="">
matcher method on the appropriate target object, and then extracting the<br class="">
components with the carrier accessor method handles if the match is successful.<br class="">
(Deconstructors are total, so are always successful, but for other patterns,<br class="">
null is returned from the matcher method on failure to match.)<br class="">
<br class="">
#### Method names<br class="">
<br class="">
The name of the matcher method is mangled to support overloading. The JVM<br class="">
permits overloading on parameter types, but not return types (and overloaded<br class="">
matchers are effectively overloaded on return types.) We take the approach of<br class="">
encoding the erasure of the matcher descriptor in the name of the pattern. This<br class="">
has several desirable properties: it is stable (the name is derived solely from<br class="">
stable aspects of the declaration), for matchers with override-equivalent<br class="">
signatures (deconstructors can't be overridden, but other patterns can be),<br class="">
these map to true overrides in the translation, and valid overloads of matchers<br class="">
will always have distinct names.<br class="">
<br class="">
We use the ["Symbolic Freedom"]() encoding of the erasure of the matcher<br class="">
descriptor as the mangled disambiguator, which is exactly as stable as any other<br class="">
method descriptor derived from source declarations.<br class="">
<br class="">
#### Attributes<br class="">
<br class="">
Because patterns are methods, we can take advantage of all the affordances of<br class="">
methods. We can use access bits to control accessibility; we can use the<br class="">
attributes that carry annotations, method parameter metadata, and generics<br class="">
signatures to carry information about the pattern declaration (and its (input)<br class="">
parameters, when we get to those kinds of matchers). What's missing is the fact<br class="">
that this is a pattern implementation and not an ordinary method, and a place to<br class="">
put metadata for bindings. To address the first, we can add the following<br class="">
attribute on matcher methods:<br class="">
<br class="">
Matcher {<br class="">
u2 name; // "Matcher"<br class="">
u4 length;<br class="">
u2 patternFlags;<br class="">
u2 patternName; // UTF8<br class="">
u2 patternDescr; // MethodType<br class="">
u2 attributes_count;<br class="">
attribute_info attributes[attributes_count];<br class="">
}<br class="">
<br class="">
This says that "this method is a pattern". The source name of the pattern<br class="">
declaration is reified as `patternName`, and the matcher descriptor, which<br class="">
encodes the types of the bindings, is reified as a `MethodType` in<br class="">
`patternDescr`. The `flags` word can carry matcher-specific information such as<br class="">
"this matcher is a deconstructor" or "this matcher is total".<br class="">
<br class="">
A matcher method may have the usual variety of method attributes, such as<br class="">
`RuntimeInvisibleAnnotations` for annotations on the matcher declaration itself.<br class="">
<br class="">
If we wish to encode information about the matcher _bindings_, we do so with<br class="">
attributes inside the `Matcher` annotation itself. Attributes such as<br class="">
`Signature`, `ParameterNames`, `RuntimeVisibleParameterAnnotations`, etc, can<br class="">
appear in a `Matcher` and are interpreted relative to the matcher signature or<br class="">
descriptor. So if we had a matcher:<br class="">
<br class="">
```<br class="">
matcher Foo(@Bar List<String> list) { ... }<br class="">
```<br class="">
<br class="">
then the `Matcher` would contain the signature attribute corresponding to<br class="">
`(List<String>)` and a `RuntimeXxxParameterAnnotations` attribute describing the<br class="">
`@Bar` annotation on the first "parameter".<br class="">
<br class="">
#### Reflection<br class="">
<br class="">
Since matchers are a new kind of class member, they will need a new kind of<br class="">
reflective object, and a method that is analogous to `Class::getConstructors`.<br class="">
The reflective object should extend `Executable`, as all of the existing methods<br class="">
on `Executable` make sense for patterns (using `Object` as the return type.) If<br class="">
the pattern is reflectively invoked, it returns `null` for no match, or an<br class="">
`Object[]` which is the boxing of the values in the carrier.<br class="">
<br class="">
We will then need some additional methods to describe the bindings, so the<br class="">
subtype of `Executable` has methods like `getBindings`, `getAnnotatedBindings`,<br class="">
`getGenericBindings`, `isDeconstructor`, `isPartial`, etc. These methods will<br class="">
decode the `Matcher` attribute and its embedded attributes.<br class="">
<br class="">
## Summary<br class="">
<br class="">
This design borrows from previous rounds, but makes a number of simplifications.<br class="">
<br class="">
- The bindings of a pattern are captured in a `MethodType`, called the _matcher<br class="">
descriptor_. The parameters of the matcher descriptor are the types of the<br class="">
bindings; the return type is either `V` or the minimal type that will match<br class="">
(but is not as important as the bindings.)<br class="">
- Matchers are translated as methods whose names are derived deterministically<br class="">
from the name of the matcher and the erasure of the pattern descriptor. These<br class="">
are called _matcher methods_. Matcher methods take as parameters the input<br class="">
parameters of the pattern (if any), and return `Object`.<br class="">
- The returned object is an opaque carrier. Null means the pattern didn't<br class="">
match. A non-null value is the carrier type (from the carrier runtime) which<br class="">
is derived from the pattern descriptor.<br class="">
- Matcher methods are not directly invocable from the source language; they are<br class="">
invoked indirectly through pattern matching or reflection.<br class="">
- Generated code invokes the matcher method and interprets the returned value<br class="">
according to the protocol, using MHs from the carrier runtime to access the<br class="">
bindings.<br class="">
- Matcher methods have a `Matcher` attribute, which captures information about<br class="">
the matcher as a whole (is a total/partial, a deconstructor, etc) and<br class="">
parameter-related attributes which describe the bindings.<br class="">
- Matchers are reflected through a new subtype of `Executable`, which exposes<br class="">
new methods to reflect over bindings.<br class="">
- When invoking a matcher reflectively, the carrier is boxed to an Object[].<br class="">
<br class="">
<br class="">
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>