Declared patterns -- translation and reflection
Remi Forax
forax at univ-mlv.fr
Tue Mar 29 22:36:53 UTC 2022
> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "amber-spec-experts" <amber-spec-experts at openjdk.java.net>
> Sent: Tuesday, March 29, 2022 11:01:18 PM
> Subject: Declared patterns -- translation and reflection
> Time to take a peek ahead at _declared patterns_. Declared patterns come in
> three varieties -- deconstruction patterns, static patterns, and instance
> patterns (corresponding to constructors, static methods, and instance methods.)
> I'm going to start with deconstruction patterns, but the basic game is the same
> for all three.
> Ignoring the trivial details, a deconstruction pattern looks like a "constructor
> in reverse":
> ```{.java}
> class Point {
> int x, y;
> Point(int x, int y) {
> this.x = x;
> this.y = y;
> }
[....]
> }
> ```
> Deconstruction patterns share the weird behaviors that constructors have in that
> they are instance members, but are not inherited, and that rather having names,
> they are accessed via the class name.
> Deconstruction patterns differ from static/instance patterns in that they are by
> definition total; they cannot fail to match. (This is a somewhat arbitrary
> simplification in the object model, but a reasonable one.) They also cannot
> have any input parameters, other than the receiver.
> Patterns differ from their ctor/method counterparts in that they have what
> appear to be _two_ argument lists; a parameter list (like ctors and methods),
> and a _binding_ list. The parameter list is often empty (with the receiver as
> the match target). The binding list can be thought of as a "conditional
> multiple return". That they may return multiple values (and, for partial
> patterns, can return no values at all when they don't match) presents a
> challenge for translation to classfiles, and for the reflection model.
> #### Translation to methods
> Patterns contain imperative code, so surely we want to translate them to methods
> in some way. The pattern input parameters map cleanly to method parameters.
> The pattern bindings need to tunneled, somehow, through the method return (or
> some other mechanism). For our deconstructor, we might translate as:
> PatternCarrier <dtor>()
> (where the method applies the pattern, and PatternCarrier wraps and provides
> access to the bindings) or
> PatternObject <dtor>()
> (where PatternObject provides indirection to behavior to invoke the pattern,
> which in turn returns the carrier.)
> With either of these approaches, though, the pattern name is a problem, because
> patterns can be overloaded on their _bindings_, but both of these return types
> are insensitive to bindings.
> It is useful to characterize the "shape" of a pattern with a MethodType, where
> the parameters of the MethodType are the binding types. (The return type is
> less constrained, but it is sometimes useful to use the return type of the
> MethodType for the required type of the pattern.) Call this the "descriptor" of
> the pattern.
> If we do this, we can use some name mangling to encode the descriptor in the
> method name:
> PatternCarrier name$mangle()
> The mangling has to be stable across compilations with respect to any source-
> and binary-compatible changes to the pattern declaration. One mangling that
> works quite well is to use the "symbolic-freedom encoding" of the erasure of
> the pattern descriptor. Because the erasure of the descriptor is exactly as
> stable as any other method signature derived from source declarations, it will
> have the desired binary compatibility properties, overriding will work as
> expected, etc.
I think we need a least to use a special name like <deconstructor> the same way we have <init>.
I agree that we also need to encode the method type descriptor (the carrier type) into the name, so the name of the method in the classfile should be <deconstructor+mangle> or <name+mangle> (or perhaps <pattern+name+mangle> ofr the pattern methods).
> #### Return value
> In an earlier design, we used a pattern object (which was a bundle of method
> handles) as the return value of the pattern. This enabled clients to invoke
> these via condy and bind method handles into the constant pool for
> deconstruction and static patterns.
> Either way, we make use of some sort of carrier object to carry the bindings
> from the pattern to the client; either we return the carrier from the pattern
> method, or there is a method on the pattern object that we invoke to get a
> carrier. We have a few preferences about the carrier; we'd like to be able to
> late-bind to the actual implementation (i.e., we don't want to freeze the name
> of a carrier class in the method descriptor), and at least for records, we'd
> like to let the record instance itself be the carrier (since it is immutable
> and we can just invoke the accessors to get the bindings.)
So the return type is either Object (too hide the type of the carrier) or a lambda that returns an Object (PatternObject or PatternCarrier acting like a glorified lambda).
> #### Carriers
> As part of the work on template strings, Jim has put back some code that was
> originally written for the purpose of translating patterns, called "carriers".
> There are methods / bootstraps that take a MethodType and return method handles
> to (a) encode values of those types into an opaque carrier object and (b) pull
> individual values out of a carrier. This means that the choice of carrier
> object can be deferred to runtime, as long as both the bundling and unbundling
> methods handles agree on the carrier form.
> The choice of carrier is largely a footprint/specificity tradeoff. One could
> imagine a carrier class per shape, or a single carrier class that wraps an
> Object[], or caching some number of common shapes (three ints and two refs).
> This sort of tuning should be separate from the protocol encoded in the
> bytecode of the pattern method and its clients.
> The pattern matching runtime will provide some condy bootstraps which wrap the
> Carriers behavior.
> Since at least some patterns are conditional, we have to have a way to encode
> failure into the protocol. For a partial pattern, we can use a B2 carrier and
> use null to encode failure to match; for a total pattern, we can use a B3
> carrier.
> #### Proposed encoding
> Earlier explorations did a lot of work to preserve the optimization that a match
> target can be its own carrier. But further analysis reveals that the cost of
> doing so for other than records is pretty substantial and works against the
> model of a pattern declaration being an imperative body of code that runs at
> match time. So for record patterns, we can "inline" them by using `instanceof`
> as the applicability test and accessors for extraction, and for all other
> patterns, go through the carrier runtime.
> This allows us to encode pattern methods as
> Object name$mangle(ARGS)
> and have the pattern method do the match and return a carrier (or null), using
> the carrier object that the carrier runtime associates with the pattern
> descriptor. And clients can take apart the result again using the extraction
> logic that the carrier runtime associates with the pattern descriptor.
> This also means that instance patterns "just work" because virtual dispatch
> selects the right implementation for us automatically, and all implementations
> that can be overrides will also implicitly agree on the encoding.
> Because patterns are methods, we can take advantage of all the affordances of
> methods. We can use access bits to control accessibility in the obvious way; we
> can use the attributes that carry annotations, method parameter metadata, and
> generics signatures to carry information about the pattern declaration and its
> parameters. What's missing is a place to put metadata for the *bindings*, and
> to record the fact that this is a pattern implementation and not an ordinary
> method. So, we add the following attribute on pattern methods:
> Pattern {
> u2 attr_name;
> u4 attr_length;
> u2 patternFlags; // bitmask
> u2 patternName; // index of UTF8 constant
> u2 patternDescr; // index of MethodType (or alternately UTF8) constant
> u2 attributes_count;
> attribute_info attributes[attributes_count];
> }
> This says that "this method is a pattern", reifies the name of the pattern
> (patternName), reifies the pattern descriptor (patternDescr) which encodes the
> types of the bindings as a method descriptor or MethodType, and has attributes
> which can carry annotations, parameter metadata, and signature metadata for the
> bindings. The existing attributes (e.g. Signature, ParameterNames, RVAA) can be
> reused as is, with the interpretation that this is the signature (or names, or
> annos) of the *bindings*, not the input parameters. Flags can carry things like
> "deconstructor pattern" or "partial pattern" as needed.
>From the classfile POV, a constructor is a method with a funny name in between brackets, i think deconstructor and pattern methods should work the same way.
Unlike a constructor, we need a way to attach the carrier type (and perhaps the pattern name) on the side, so an attribute on the pattern method seems the right choice.
> ## Reflection
> We already have a sensible base class in the reflection library for reflecting
> patterns: Executable. All of the methods on Executable make sense for patterns,
> including Object as the return type. If the pattern is reflectively invoked, it
> will return null (for no match) or an Object[]; this Object[] can be thought of
> as the boxing of the carrier. Since the method return type is Object, this is
> an entirely reasonable interpretation.
> We need some additional methods to describe the bindings, so we would have a
> subtype of Executable for Pattern, with methods like getBindings(),
> getAnnotatedBindings(), getGenericBindings(), isDeconstructor(), isPartial(),
> etc.
I agree if getBindings() return a Class<?>[].
As i said, apart from the semantics implied by the proposed syntax, the rest of the design is great.
Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20220330/c399342a/attachment-0001.htm>
More information about the amber-spec-experts
mailing list