Use of condy + MH.invokevirtual instead of indy Was Deconstruction patterns

Tue Mar 7 09:21:12 UTC 2023

----- Original Message -----
> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Sent: Monday, March 6, 2023 7:24:54 PM
> Subject: Deconstruction patterns

> Time to look ahead to the next installment of pattern matching:
> deconstruction patterns, which generalize record patterns.  This
> document does an end-to-end walkthrough (at a sketchy level of detail)
> through declaration, overloading, use, translation, and reflection of
> deconstruction patterns.
> 
> I would like to *not* discuss syntax at this time.  There's a lengthy
> discussion to be had about syntax, and we'll have that, but let's nail
> down model, semantics, and translation first.
> 
> As usual, I would prefer that people either (a) post a single reply
> addressing the totality of this sketch or (b) start _new threads_ if you
> want to discuss a specific aspect.  A quick "I'll just reply to this
> minor detail" seems to often derail the conversation in such a way that
> it never comes back.  If this all looks fine to you, a quick "no
> surprises here" will keep us from suspensefully waiting for feedback.
> 
> 
> # Deconstruction patterns -- translation, use, and reflection
> 
> As we are wrapping up record patterns, it's time to look ahead to the
> next major
> part of the pattern matching story -- extending the capabilities of record
> patterns to all classes that want to support destructuring. Record
> patterns are
> simply a special case of _deconstruction patterns_ or _deconstructors_,
> where we
> derive the deconstructor API, implementation, and use from the state
> description
> of the record.  For an arbitrary class, a deconstruction patterns will
> require
> an explicit member declaration, with a header identifying the names and
> types of
> the bindings and a body that extracts the bindings from the representation.
> 
> ## Deconstructors
> 
> Just as constructors are special cases of methods, deconstruction
> patterns are
> special cases of a more general notion of declared pattern, which also
> includes
> static matchers (the dual of static methods) and instance matchers (the
> dual of
> instance methods.)  Specifically, unlike the more general notion of
> matcher, a
> deconstructor must be _total_; it must always match.  This document will
> focus
> exclusively on deconstructors, and we'll come back to static and instance
> matchers in due time.  (But note that some of the design choices in the
> simple
> case of deconstructors may be constrained by the more general case.)
> 
> There are a number of choices for how we might syntactically represent a
> deconstructor (or more generally, a declared pattern.)  For purposes of
> illustration, this document picks one possible syntactic expression of
> deconstructors, but it is premature to devolve into a syntax discussion
> at this
> time.
> 
> ```
> class Point {
>     final double x, y;
> 
>     public Point(double x, double y) {
>         this.x = x;
>         this.y = y;
>     }
> 
>     public matcher Point(double x, double y) {
>         x = this.x;
>         y = this.y;
>     }
> }
> ```
> 
> This example illustrates two aspects of the duality between constructors and
> their corresponding deconstructors.  Their APIs are duals: a constructor
> takes N
> parameters containing the desired description of the object state and
> produces a
> constructed object; a deconstructor starts from the constructed object
> and has N
> bindings (outputs) that receive the desired state components. Similarly,
> their
> implementations are duals: the body of the constructor initializes the
> object
> representation from the description, and the body of the deconstructor
> extracts
> the description from the representation.  A deconstructor is best
> understood as
> a _co-constructor_.
> 
> The `Point` example above is special in two ways.  First, the internal
> representation of a `Point`, and the API of the constructor and
> deconstructor,
> are the same: `(double x, double y)`.  We can call the API implied by the
> constructor and deconstructor the _external representation_, and for
> `Point`,
> both the internal and external representations are the same. (This is one of
> the requirements for being a candidate to be a record.)  And second, the
> constructor is _total_; it does not reject any combinations of arguments.
> 
> Here's another version of `Point` which does not have these special
> aspects; it
> uses the same internal representation as before, but chooses a pair of
> strings
> as the external representation:
> 
> ```
> class Point2 {
>     final double x, y;
> 
>     public Point2(String x, String y) {
>         this.x = Double.parseDouble(x);
>         this.y = Double.parseDouble(y);
>     }
> 
>     public matcher Point2(String x, String y) {
>         x = Double.toString(this.x);
>         y = Double.toSTring(this.y);
>     }
> }
> ```
> 
> The method `Double::parseDouble` will throw `NumberFormatException` if its
> argument does not describe a suitable value, so unlike the `Point`
> constructor,
> the `Point2` constructor is partial: it will reject `new Double("foo",
> "bar")`.
> And the internal representation is no longer the same as the external
> representation.  Less obviously, there are valid string values that we can
> provide to the constructor, but which cannot be represented exactly as
> `double`,
> and which will be approximated; the string value
> `"3.22222222222222222222222222222222222222"` will be approximated with the
> double value `3.2222222222222223`.
> 
> This example highlights more clearly how the constructor and
> deconstructor form
> an _embedding-projection pair_ between the internal and external
> representations.  While some external representations might be invalid,
> and some
> might result in approximation, deconstruct-then-construct is always an
> identity
> transformation.  Indeed, the specification of `java.lang.Record`
> requires that
> if we deconstruct a record with its accessors, and pass the resulting values
> back to the constructor, we should get a new record that is `equals` to the
> original.
> 
> The fact that constructor and deconstructor (and eventually, factory and
> static
> matcher) form an embedding-projection pair is why we are able to derive
> higher-level language features, such as [safer
> serialization](https://openjdk.org/projects/amber/design-notes/towards-better-serialization)
> and [functional transformation of immutable
> objects](https://github.com/openjdk/amber-docs/blob/master/eg-drafts/reconstruction-records-and-classes.md),
> from a matched set of constructor and deconstructor.
> 
> Of course, users are free to implement constructors without
> deconstructors, or
> constructors and deconstructors whose external representations don't
> match up,
> or even matching constructors and deconstructors that are not
> behaviorally dual.
> But providing a matched set (or several) of constructors and deconstructors
> enables reliably reversible aggregation, and allows us to mechanically
> derive
> useful higher-level features such as withers.
> 
> #### Overloading
> 
> Just as constructors can be overloaded, deconstructors can be overloaded
> for the
> same reason: multiple constructors can expose multiple external
> representations
> for aggregation, and corresponding deconstructors can recover those multiple
> external representations.  Any matching pair of
> constructor-deconstructor (and
> eventually, factory-deconstructor) is a candidate for use in higher-level
> features based on the embedding-projection nature of the
> constructor-deconstructor pair.
> 
> Just as deconstruction is dual to construction, overloading of
> deconstructors is
> dual to that of constructors: rather than restricting which sets of
> parameters
> can be overloaded against each other, we do so with the bindings
> instead.  For
> constructors of a given arity, we require that their signatures not be
> override-equivalent; for deconstructors of a given arity, we require the
> same of
> their bindings.
> 
> For a deconstructor (and declared patterns in general), we derive a _binding
> signature_ (and an erased _binding descriptor_) which treats the binding
> list as
> a parameter list.  The overload rule outlined above requires that binding
> signatures for two deconstructors of the same arity not be
> override-equivalent.
> (We will find it useful later to derive a `MethodType` for the binding
> descriptor; this is a `MethodType` whose return type is `V` and whose
> parameter
> types are the erased types of the bindings.)
> 
> #### Digression: embedding-projection pairs
> 
> Given two sets _A_ and _B_, a pair of functions `e : A -> B` and `p : B
> -> A`,
> forms an _embedding-projection pair_ if `p . e` (embed then project) is an
> identity function, and `e . p` (project then embed) _approximates_ the input
> according to a domain-specific approximation metric (which is a complete
> partial
> ordering on `B`.)
> 
> When applied to constructor-deconstructor pairs, this says that
> deconstructing
> an object and then reconstructing it with the resulting bindings should
> result
> in an equivalent object, and constructing an object from an external
> representation and then deconstructing it back into that external
> representation
> should result in an approximation of the original external
> representation.  (A
> complete partial ordering models constructor failure as the non-terminating
> bottom value, which is considered an infinitely bad approximation to
> everything.)
> 
> Embedding-projection pairs have a number of desirable properties, such
> as the
> composition of two e-p pairs is an e-p pair; this property is at the
> heart of
> using constructor-deconstructor pairs for improved serialization and
> functional
> transformation.
> 
> ## Invoking deconstructors
> 
> We've already seen how to "invoke" deconstructors: through pattern matching.
> What we've been calling "record patterns" are merely deconstruction patterns
> derived mechanically from the state description, just as we do with
> constructors
> and accessors; there is little difference between record patterns and
> deconstruction patterns other than the ability to declare them explicitly.
> (There is an accidental difference in the translation, in that we currently
> implement record patterns by appealing to individual accessors rather than a
> single deconstructor, but this may eventually converge as well.)
> 
> The use-site syntax of deconstruction bears a deliberate similarity to
> that of
> construction; `new Point(x, y)` is deconstructed by `case Point(var x,
> var y)`.
> 
> #### Overload selection
> 
> In the presence of overloaded deconstructors, we need to figure out which
> deconstructor a deconstruction pattern `C(P*)` is referring to. The
> details are
> similar to overload selection for methods, except that we operate on the
> bindings rather than the parameters.  We first search for _applicable
> matchers_,
> using increasingly loose criteria (first excluding boxing, unboxing, and
> varargs; then including boxing and unboxing but not varargs; and
> finally, all
> candidates) and then selecting the most applicable.
> 
> It is tempting to try and bypass the three-phase selection process and use a
> simpler notion of applicability (perhaps noting that we got this process for
> compatibility with existing overload selection decisions when autoboxing and
> varargs were added, and that there are few deconstructor invocations to be
> compatible with yet.)  But because existing overloaded constructors use this
> mechanism, and there is significant value in pairing constructors and
> deconstructors, attempting to invent a simpler-but-different overload
> selection
> mechanism for deconstructors would inevitably undermine the duality between
> matching constructor-deconstructor pairs. So compatibility (this time, with
> existing overloaded constructors) once again forces our hand.
> 
> The specification for overload selection is complicated significantly by
> poly
> expressions (e.g., lambdas); fortunately, there are no "poly patterns",
> and so,
> while the structure of JLS 15.12.2 is retained for overload selection of
> deconstruction patterns, much of the detail is left behind.
> 
> ## Translation
> 
> We translate patterns into synthetic methods with a `Matcher` attribute;
> this
> method implements the matcher behavior.  The translation scheme derives
> from a
> number of requirements, only some of which are in play for deconstructors.
> 
> The matcher method for a deconstructor is a final instance method that
> takes no
> parameters and returns `Object`, perhaps with a special name (just as
> constructors are called `<init>`.)
> 
> #### Carriers
> 
> Because the matcher methods implements the matcher behavior, but a
> matcher may
> "return" multiple bindings (or failure), we must encode the bindings in some
> way.  For this, we use a _carrier object_.  The choice of carrier is
> largely a
> footprint/specificity tradeoff.  One could imagine a carrier class per
> matcher,
> or a carrier class per matcher descriptor, or using `Object[]` as a
> carrier for
> everything, or caching some number of common shapes (e.g, three ints and two
> refs).  This sort of tuning should be separate from the protocol encoded
> in the
> bytecode of the pattern method and its clients.
> 
> We use a small _carrier runtime_ to decouple pattern translation from
> carrier
> selection.  (This same carrier runtime is used by string templates as well.)
> This allows tradeoffs in runtime characteristics (e.g., carrier per
> matcher vs
> sharing carriers across matchers, dropping carrier identity with value types
> later, etc) without affecting the translation. The carrier API consists
> of condy
> bootstraps like:
> 
> ```
> static MethodHandle carrierFactory(MethodType matcherDescriptor) { ... }
> static MethodHandle carrierAccessor(MethodType matcherDescriptor, int
> bindingNo) { ... }
> ```
> 
> The `matcherDescriptor` is a `MethodType` describing the binding types.  The
> `carrierFactory` method returns a method handle which takes the bindings and
> produces a carrier object; the `carrierAccessor` method returns method
> handles
> that take the carrier object and return the corresponding binding.  To
> indicate
> success, the matcher method invokes the carrier factory method handle and
> returns the result; to indicate failure (deconstructors cannot fail, but
> other
> matchers can) the matcher method returns null.
> 
> We would translate the XY deconstructor from `Point` as follows
> (pseudo-code):
> 
> ```
> #100: MethodType[(II)V]
> #101: Condy[bsm=Carriers::carrierFactory, args=[#100]]
> 
> final synthetic Object Point$MANGLE() {
>     aload_0
>     getfield Point::x
>     aload_0
>     getfield Point::y
>     LDC #101
>     invokevirtual MethodHandle::invoke(II)V
>     areturn
> }
> ```
> 
> Constant `#100` contains a `MethodType` holding the binding descriptor;
> constant
> `#101` holds a method handle whose parameters are the parameter types of the
> binding descriptor and returns `Object`.
> 
> At the use site, matching a deconstruction pattern is performed by
> invoking the
> matcher method on the appropriate target object, and then extracting the
> components with the carrier accessor method handles if the match is
> successful.
> (Deconstructors are total, so are always successful, but for other patterns,
> null is returned from the matcher method on failure to match.)

First, there is a small typo, 
  invokevirtual MethodHandle::invoke(II)V

should be
  invokevirtual MethodHandle::invoke(II)Ljava/lang/Object;

I believe that indy is better than ldc condy + MH.invokevirtual because the main difference between these two approaches is that in the case of indy the bootstrap method receive the Lookup and I believe you need the Lookup at runtime (exactly the ClassLoader) to decode the field "patternDescr" of the attribute Matcher (to transform it to a MethodType).

regards,
Rémi