Deconstruction patterns

Sat Mar 11 13:58:38 UTC 2023

I've coded a possible implementation here
  https://github.com/forax/amber-deconstructor

- instead of mangling the name, i've followed literally what Dan H said, the VM is good to find a method from a method name and a *descriptor*, so like when you want to have two overloads of a constructor with the same parameter types, you add a fake parameter of type boolean or Void, i've added all the parameters so the algorithm to find the deconstructors in case of overloads is strictly the same as with methods. At call sites, the compiler fill the arguments with zeroes.
- instead of using final instance methods, I use static methods so it also works in an interface (a default method in an interface can not be final). 

By example the class Point,
 class Point {
     final double x, y;

     public Point(double x, double y) {
         this.x = x;
         this.y = y;
     }

     public matcher Point(double x, double y) {
         x = this.x;
         y = this.y;
     }

     public matcher Point(int x, int y) {
        x = (int) this.x;
        y = (int) this.y;
     }
 }

is desugared to
 class Point {
     final double x, y;

     public Point(double x, double y) {
         this.x = x;
         this.y = y;
     }

     // use a static final to emulate a condy
     private static final MethodHandle FACTORY0 =
         RT.carrierFactory(MethodHandles.lookup(), "", MethodHandle.class, methodType(void.class, int.class, int.class));

     public static /*synthetic*/ Object Point(Point that, double _, double _) {
         double x = that.x;
         double y = that.y;
         return FACTORY0.invokeExact(x, y);
     }

     // use a static final to emulate a condy
     private static final MethodHandle FACTORY1 =
         RT.carrierFactory(MethodHandles.lookup(), "", MethodHandle.class, methodType(void.class, int.class, int.class));

     public static /*synthetic*/ Object Point(Point that, int _, int _) {
        int x = (int) that.x;
        int y = (int) that.y;
        return FACTORY1.invokeExact(x, y);
     }
 }

and at use site

    var point = new Point(2, 3);
    switch (point) {
        case Point(int x, int y) -> {
           ...
        }
    }

is desugared to

  // use a static finals to emulate condies
  private static final MethodHandle POINT_ACCESSOR_0 =
      RT.carrierAccessor(MethodHandles.lookup(), "", MethodHandle.class, methodType(void.class, int.class, int.class), 0);
  private static final MethodHandle POINT_ACCESSOR_1 =
      RT.carrierAccessor(MethodHandles.lookup(), "", MethodHandle.class, methodType(void.class, int.class, int.class), 1);
  ...

    var point = new Point(2, 3);
    switch (point) {
        case Point p -> {
            var carrier = Point.Point(p, 0, 0);   // use zeroes to ask for the right overloads
            var x = (int) POINT_ACCESSOR_0.invokeExact(carrier);
            var y = (int) POINT_ACCESSOR_1.invokeExact(carrier);
           ...
        }
    }

The prototype works both with jdk 20 and latest Valhalla early preview. For the later, the generated carrier classes are value classes.

As an implementation detail, the prototype erase all reference types to Object to avoid to generate too many carrier classes at runtime.
Perhaps a better to solution would be to also re-organize the fields to have all objects first and do a permuteArguments when creating the factory method handle.
I did not convert the numeric primitive type (boolean, short, char) to int because if the carrier is a value type, we may want it as compact as possible.

regards,
Rémi

----- Original Message -----
> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "Guy Steele" <guy.steele at oracle.com>
> Cc: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Sent: Wednesday, March 8, 2023 4:07:52 PM
> Subject: Re: Deconstruction patterns

> Updated version, addressing Guy's concerns and fixing/clarifying a few
> other things that came up in earlier discussions.
> 
> 
> # Deconstruction patterns -- translation, use, and reflection
> 
> As we are wrapping up record patterns, it's time to look ahead to the
> next major
> part of the pattern matching story -- extending the capabilities of record
> patterns to all classes that want to support destructuring. Record
> patterns are
> simply a special case of _deconstruction patterns_ or _deconstructors_,
> where we
> derive the deconstructor API, implementation, and use from the state
> description
> of the record.  For an arbitrary class, a deconstruction patterns will
> require
> an explicit member declaration, with a header identifying the names and
> types of
> the bindings and a body that extracts the bindings from the representation.
> 
> ## Deconstructors
> 
> Just as constructors are special cases of methods, deconstruction
> patterns are
> special cases of a more general notion of declared pattern, which also
> includes
> static matchers (the dual of static methods) and instance matchers (the
> dual of
> instance methods.)  Specifically, unlike the more general notion of
> matcher, a
> deconstructor must be _total_; it must always match.  This document will
> focus
> exclusively on deconstructors, and we'll come back to static and instance
> matchers in due time.  (But note that some of the design choices in the
> simple
> case of deconstructors may be constrained by the more general case.)
> 
> There are a number of choices for how we might syntactically represent a
> deconstructor (or more generally, a declared pattern.)  For purposes of
> illustration, this document picks one possible syntactic expression of
> deconstructors, but it is premature to devolve into a syntax discussion
> at this
> time.
> 
> ```
> class Point {
>     final double x, y;
> 
>     public Point(double x, double y) {
>         this.x = x;
>         this.y = y;
>     }
> 
>     public matcher Point(double x, double y) {
>         x = this.x;
>         y = this.y;
>     }
> }
> ```
> 
> This example illustrates two aspects of the duality between constructors and
> their corresponding deconstructors.  Their APIs are duals: a constructor
> takes N
> parameters containing the desired description of the object state and
> produces a
> constructed object; a deconstructor starts from the constructed object
> and has N
> bindings (outputs) that receive the desired state components. Similarly,
> their
> implementations are duals: the body of the constructor initializes the
> object
> representation from the description, and the body of the deconstructor
> extracts
> the description from the representation.  A deconstructor is best
> understood as
> a _co-constructor_.
> 
> The `Point` example above is special in two ways.  First, the internal
> representation of a `Point`, and the API of the constructor and
> deconstructor,
> are the same: `(double x, double y)`.  We can call the API implied by the
> constructor and deconstructor the _external representation_, and for
> `Point`,
> both the internal and external representations are the same. (This is one of
> the requirements for being a candidate to be a record.)  And second, the
> constructor is _total_; it does not reject any combinations of arguments.
> 
> Here's another version of `Point` which does not have these special
> aspects; it
> uses the same internal representation as before, but chooses a pair of
> strings
> as the external representation:
> 
> ```
> class Point2 {
>     final double x, y;
> 
>     public Point2(String x, String y) {
>         this.x = Double.parseDouble(x);
>         this.y = Double.parseDouble(y);
>     }
> 
>     public matcher Point2(String x, String y) {
>         x = Double.toString(this.x);
>         y = Double.toSTring(this.y);
>     }
> }
> ```
> 
> The method `Double::parseDouble` will throw `NumberFormatException` if its
> argument does not describe a suitable value, so unlike the `Point`
> constructor,
> the `Point2` constructor is partial: it will reject `new Double("foo",
> "bar")`.
> And the internal representation is no longer the same as the external
> representation.  Less obviously, there are valid string values that we can
> provide to the constructor, but which cannot be represented exactly as
> `double`,
> and which will be approximated; the string value
> `"3.22222222222222222222222222222222222222"` will be approximated with the
> double value `3.2222222222222223`.
> 
> This example highlights more clearly how the deconstructor and
> constructor form
> an _embedding-projection pair_ between the external and internal
> representations.  While some external representations might be invalid,
> and some
> might result in approximation, deconstruct-then-construct is always an
> identity
> transformation.  (Indeed, the specification of `java.lang.Record`
> requires that
> if we deconstruct a record with its accessors, and pass the resulting values
> back to the constructor, we should get a new record that is `equals` to the
> original.)
> 
> The fact that deconstructor and constructor (and eventually, static
> matcher and
> factory) form an embedding-projection pair is why we are able to derive
> higher-level language features, such as [safer
> serialization](https://openjdk.org/projects/amber/design-notes/towards-better-serialization)
> and [functional transformation of immutable
> objects](https://github.com/openjdk/amber-docs/blob/master/eg-drafts/reconstruction-records-and-classes.md),
> from a matched set of constructor and deconstructor.
> 
> Of course, users are free to implement constructors without
> deconstructors, or
> constructors and deconstructors whose external representations don't
> match up,
> or even matching constructors and deconstructors that are not
> behaviorally dual.
> But providing a matched set (or several) of constructors and deconstructors
> enables reliably reversible aggregation, and allows us to mechanically
> derive
> useful higher-level features such as withers.
> 
> #### Overloading
> 
> Just as constructors can be overloaded, deconstructors can be overloaded
> for the
> same reason: multiple constructors can expose multiple external
> representations
> for aggregation, and corresponding deconstructors can recover those multiple
> external representations.  Any matching pair of
> deconstructor-constructor (and
> eventually, deconstructor-factory) is a candidate for use in higher-level
> features based on the embedding-projection nature of the
> deconstructor-constructor pair.
> 
> Just as deconstruction is dual to construction, overloading of
> deconstructors is
> dual to that of constructors: rather than restricting which sets of
> parameters
> can be overloaded against each other, we do so with the bindings
> instead.  For
> constructors we require that their signatures not be
> override-equivalent; for
> deconstructors, we require the same of their bindings.
> 
> For a deconstructor (and declared patterns in general), we derive a _binding
> signature_ (and an erased _binding descriptor_) which treats the binding
> list as
> if it were a parameter list.  The overload rule outlined above requires that
> binding signatures for two deconstructors of the same arity not be
> override-equivalent. (We will find it useful later to derive a
> `MethodType` for
> the binding descriptor; this is a `MethodType` whose parameter types are the
> erased types of the bindings.)
> 
> #### Digression: embedding-projection pairs
> 
> Given two sets _A_ and _B_, a pair of functions `e : A -> B` and `p : B
> -> A`,
> forms an _embedding-projection pair_ if embed-then-project is an identity
> function, and project-then-embed _approximates_ the input according to a
> domain-specific approximation metric.  In our example which used strings
> as an external representation, projection rejected some inputs (`"foo"`) and
> approximated others.
> 
> When applied to deconstructor-constructor pairs, this says that
> deconstructing
> an object and then reconstructing it with the resulting bindings should
> result
> in an equivalent object, and constructing an object from an external
> representation and then deconstructing it back into that external
> representation
> should result in an approximation of the original external representation,
> assuming nothing bad happens during construction such as rejecting a bad
> input.
> 
> Embedding-projection pairs have a number of desirable properties, such
> as the
> composition of two e-p pairs is an e-p pair; this property is at the
> heart of
> using deconstructor-constructor pairs for improved serialization and
> functional
> transformation.
> 
> ## Invoking deconstructors
> 
> We've already seen how to "invoke" deconstructors: through pattern matching.
> What we've been calling "record patterns" are merely deconstruction patterns
> derived mechanically from the state description of a record, just as we
> do with
> constructors and accessors; there is little difference between record
> patterns
> and deconstruction patterns other than the ability to declare them
> explicitly.
> (There is an accidental difference in the translation, in that we currently
> implement record patterns by appealing to individual accessors rather than a
> single deconstructor, but this may eventually converge as well.)
> 
> The use-site syntax of deconstruction bears a deliberate similarity to
> that of
> construction; `new Point(x, y)` is deconstructed by `case Point(var x,
> var y)`.
> 
> #### Overload selection
> 
> In the presence of overloaded deconstructors, we need to figure out which
> deconstructor a deconstruction pattern `C(P*)` is referring to. The
> details are
> similar to overload selection for methods, except that we operate on the
> bindings rather than the parameters.  We first search for _applicable
> matchers_,
> using increasingly loose criteria (first excluding boxing, unboxing, and
> varargs; then including boxing and unboxing but not varargs; and
> finally, all
> candidates) and then selecting the most applicable.
> 
> It is tempting to try and bypass the three-phase selection process and use a
> simpler notion of applicability (perhaps noting that we got this process for
> compatibility with existing overload selection decisions when autoboxing and
> varargs were added, and that there are few deconstructor invocations to be
> compatible with yet.)  But because existing overloaded constructors use this
> mechanism, and there is significant value in pairing constructors and
> deconstructors, attempting to invent a simpler-but-different overload
> selection
> mechanism for deconstructors would inevitably undermine the duality between
> matching deconstructor-constructor pairs. So compatibility (this time, with
> existing overloaded constructors) once again forces our hand.
> 
> The specification for overload selection is complicated significantly by
> poly
> expressions (e.g., lambdas); fortunately, there are no "poly patterns",
> and so,
> while the structure of JLS 15.12.2 is retained for overload selection of
> deconstruction patterns, much of the detail is left behind.
> 
> ## Translation
> 
> We translate patterns into synthetic methods with a `Matcher` attribute;
> this
> method implements the matcher behavior.  The translation scheme derives
> from a
> number of requirements, only some of which are in play for deconstructors.
> 
> The matcher method for a deconstructor is a final instance method that
> takes no
> parameters and returns `Object`, perhaps with a special name (just as
> constructors are called `<init>`.)
> 
> #### Carriers
> 
> Because the matcher methods implements the matcher behavior, but a
> matcher may
> "return" multiple bindings (or failure), we must encode the bindings in some
> way.  For this, we use a _carrier object_.  The choice of carrier is
> largely a
> footprint/specificity tradeoff.  One could imagine a carrier class per
> matcher,
> or a carrier class per matcher descriptor, or using `Object[]` as a
> carrier for
> everything, or caching some number of common shapes (e.g, three ints and two
> refs).  This sort of tuning should be separate from the protocol encoded
> in the
> bytecode of the pattern method and its clients.
> 
> We use a small _carrier runtime_ to decouple pattern translation from
> carrier
> selection.  (This same carrier runtime is used by string templates as well.)
> This allows tradeoffs in runtime characteristics (e.g., carrier per
> matcher vs
> sharing carriers across matchers, dropping carrier identity with value types
> later, etc) without affecting the translation. The carrier API consists
> of condy
> bootstraps like:
> 
> ```
> static MethodHandle carrierFactory(METADATA,
>                                    MethodType matcherDescriptor) { ... }
> static MethodHandle carrierAccessor(METADATA,
>                                     MethodType matcherDescriptor,
>                                     int bindingNo) { ... }
> ```
> 
> where `METADATA` are the standard condy metadata arguments (lookup,
> name, type.)
> The `matcherDescriptor` is a `MethodType` describing the binding types.  The
> `carrierFactory` method returns a method handle which takes the bindings and
> produces a carrier object; the `carrierAccessor` method returns method
> handles
> that take the carrier object and return the corresponding binding.  To
> indicate
> success, the matcher method invokes the carrier factory method handle and
> returns the result; to indicate failure (deconstructors cannot fail, but
> other
> matchers can) the matcher method returns null.
> 
> We would translate the XY deconstructor from `Point` as follows
> (pseudo-bytecode):
> 
> ```
> #100: MethodType[(II)V]
> #101: Condy[bsm=Carriers::carrierFactory, args=[#100]]
> 
> final synthetic Object Point$MANGLE() {
>     aload_0
>     getfield Point::x
>     aload_0
>     getfield Point::y
>     LDC #101
>     invokevirtual MethodHandle::invoke(II)Object
>     areturn
> }
> ```
> 
> Constant `#100` contains a `MethodType` holding the binding descriptor;
> constant
> `#101` holds a method handle whose parameters are the parameter types of the
> binding descriptor and returns `Object`.
> 
> At the use site, matching a deconstruction pattern is performed by
> invoking the
> matcher method on the appropriate target object, and then extracting the
> components with the carrier accessor method handles if the match is
> successful.
> (Deconstructors are total, so are always successful, but for other patterns,
> null is returned from the matcher method on failure to match.)
> 
> #### Method names
> 
> The name of the matcher method is mangled to support overloading. The JVM
> permits overloading on parameter types, but not return types (and overloaded
> matchers are effectively overloaded on return types.)  We take the
> approach of
> encoding the erasure of the matcher descriptor in the name of the
> pattern.  This
> has several desirable properties: it is stable (the name is derived
> solely from
> stable aspects of the declaration), for matchers with override-equivalent
> signatures (deconstructors can't be overridden, but other patterns can be),
> these map to true overrides in the translation, and valid overloads of
> matchers
> will always have distinct names.
> 
> We use the ["Symbolic Freedom"]() encoding of the erasure of the matcher
> descriptor as the mangled disambiguator, which is exactly as stable as
> any other
> method descriptor derived from source declarations.
> 
> #### Attributes
> 
> Because patterns are methods, we can take advantage of all the
> affordances of
> methods.  We can use access bits to control accessibility; we can use the
> attributes that carry annotations, method parameter metadata, and generics
> signatures to carry information about the pattern declaration (and its
> (input)
> parameters, when we get to those kinds of matchers).  What's missing is
> the fact
> that this is a pattern implementation and not an ordinary method, and a
> place to
> put metadata for bindings.  To address the first, we can add the following
> attribute on matcher methods:
> 
>     Matcher {
>         u2 name;                            // "Matcher"
>         u4 length;
>         u2 patternFlags;
>         u2 patternName;                     // UTF8
>         u2 patternDescr;                    // MethodType
>         u2 attributes_count;
>         attribute_info attributes[attributes_count];
>     }
> 
> This says that "this method is a pattern".  The source name of the pattern
> declaration is reified as `patternName`, and the matcher descriptor, which
> encodes the types of the bindings, is reified as a `MethodType` in
> `patternDescr`.  The `flags` word can carry matcher-specific information
> such as
> "this matcher is a deconstructor" or "this matcher is total".
> 
> A matcher method may have the usual variety of method attributes, such as
> `RuntimeInvisibleAnnotations` for annotations on the matcher declaration
> itself.
> 
> If we wish to encode information about the matcher _bindings_, we do so with
> attributes inside the `Matcher` annotation itself.  Attributes such as
> `Signature`, `ParameterNames`, `RuntimeVisibleParameterAnnotations`,
> etc, can
> appear in a `Matcher` and are interpreted relative to the matcher
> signature or
> descriptor.  So if we had a matcher:
> 
> ```
> matcher Foo(@Bar List<String> list) { ... }
> ```
> 
> then the `Matcher` would contain a signature attribute corresponding to
> `(List<String>)` and a `RuntimeXxxParameterAnnotations` attribute
> describing the
> `@Bar` annotation on the first "parameter".
> 
> #### Reflection
> 
> Since matchers are a new kind of class member, they will need a new kind of
> reflective object, and a method that is analogous to
> `Class::getConstructors`.
> The reflective object for matchers should extend `Executable`, as all of the
> existing methods on `Executable` make sense for patterns.  If the pattern is
> reflectively invoked, it returns `null` for no match, or an `Object[]`
> which is
> the boxing of the values in the carrier.
> 
> We will then need some additional methods to describe the bindings, so the
> subtype of `Executable` has methods like `getBindings`,
> `getAnnotatedBindings`,
> `getGenericBindings`, `isDeconstructor`, `isPartial`, etc.  These
> methods will
> decode the `Matcher` attribute and its embedded attributes.
> 
> ## Summary
> 
> This design borrows from previous rounds, but makes a number of
> simplifications.
> 
>  - The bindings of a pattern are captured in a `MethodType`, called the
> _matcher
>    descriptor_.  The parameters of the matcher descriptor are the types
> of the
>    bindings; the return type is either `V` or the minimal type that
> will match
>    (but is not as important as the bindings.)
>  - Matchers are translated as methods whose names are derived
> deterministically
>    from the name of the matcher and the erasure of the pattern
> descriptor. These
>    are called _matcher methods_.  Matcher methods take as parameters
> the input
>    parameters of the pattern (if any), and return `Object`.
>  - The returned object is an opaque carrier.  Null means the pattern didn't
>    match.  A non-null value is the carrier type (from the carrier
> runtime) which
>    is derived from the pattern descriptor.
>  - Matcher methods are not directly invocable from the source language;
> they are
>    invoked indirectly through pattern matching or reflection.
>  - Generated code invokes the matcher method and interprets the
> returned value
>    according to the protocol, using MHs from the carrier runtime to
> access the
>    bindings.
>  - Matcher methods have a `Matcher` attribute, which captures
> information about
>    the matcher as a whole (is a total/partial, a deconstructor, etc) and
>    parameter-related attributes which describe the bindings.
>  - Matchers are reflected through a new subtype of `Executable`, which
> exposes
>    new methods to reflect over bindings.
>  - When invoking a matcher reflectively, the carrier is boxed to an
> `Object[]`.