Declared patterns -- translation and reflection

Brian Goetz brian.goetz at oracle.com
Tue Mar 29 21:01:18 UTC 2022


Time to take a peek ahead at _declared patterns_.  Declared patterns 
come in three varieties -- deconstruction patterns, static patterns, and 
instance patterns (corresponding to constructors, static methods, and 
instance methods.)  I'm going to start with deconstruction patterns, but 
the basic game is the same for all three.

Ignoring the trivial details, a deconstruction pattern looks like a 
"constructor in reverse":

```{.java}
class Point {
     int x, y;

     Point(int x, int y) {
         this.x = x;
         this.y = y;
     }

     deconstructor(int x, int y) {
         x = this.x;
         y = this.y;
     }
}
```

Deconstruction patterns share the weird behaviors that constructors have 
in that they are instance members, but are not inherited, and that 
rather having names, they are accessed via the class name.

Deconstruction patterns differ from static/instance patterns in that 
they are by definition total; they cannot fail to match. (This is a 
somewhat arbitrary simplification in the object model, but a reasonable 
one.)  They also cannot have any input parameters, other than the receiver.

Patterns differ from their ctor/method counterparts in that they have 
what appear to be _two_ argument lists; a parameter list (like ctors and 
methods), and a _binding_ list.  The parameter list is often empty (with 
the receiver as the match target). The binding list can be thought of as 
a "conditional multiple return".  That they may return multiple values 
(and, for partial patterns, can return no values at all when they don't 
match) presents a challenge for translation to classfiles, and for the 
reflection model.

#### Translation to methods

Patterns contain imperative code, so surely we want to translate them to 
methods in some way.  The pattern input parameters map cleanly to method 
parameters.

The pattern bindings need to tunneled, somehow, through the method 
return (or some other mechanism).  For our deconstructor, we might 
translate as:

     PatternCarrier <dtor>()

(where the method applies the pattern, and PatternCarrier wraps and 
provides access to the bindings) or

     PatternObject <dtor>()

(where PatternObject provides indirection to behavior to invoke the 
pattern, which in turn returns the carrier.)

With either of these approaches, though, the pattern name is a problem, 
because patterns can be overloaded on their _bindings_, but both of 
these return types are insensitive to bindings.

It is useful to characterize the "shape" of a pattern with a MethodType, 
where the parameters of the MethodType are the binding types.  (The 
return type is less constrained, but it is sometimes useful to use the 
return type of the MethodType for the required type of the pattern.)  
Call this the "descriptor" of the pattern.

If we do this, we can use some name mangling to encode the descriptor in 
the method name:

     PatternCarrier name$mangle()

The mangling has to be stable across compilations with respect to any 
source- and binary-compatible changes to the pattern declaration.  One 
mangling that works quite well is to use the "symbolic-freedom encoding" 
of the erasure of the pattern descriptor.  Because the erasure of the 
descriptor is exactly as stable as any other method signature derived 
from source declarations, it will have the desired binary compatibility 
properties, overriding will work as expected, etc.

#### Return value

In an earlier design, we used a pattern object (which was a bundle of 
method handles) as the return value of the pattern. This enabled clients 
to invoke these via condy and bind method handles into the constant pool 
for deconstruction and static patterns.

Either way, we make use of some sort of carrier object to carry the 
bindings from the pattern to the client; either we return the carrier 
from the pattern method, or there is a method on the pattern object that 
we invoke to get a carrier.  We have a few preferences about the 
carrier; we'd like to be able to late-bind to the actual implementation 
(i.e., we don't want to freeze the name of a carrier class in the method 
descriptor), and at least for records, we'd like to let the record 
instance itself be the carrier (since it is immutable and we can just 
invoke the accessors to get the bindings.)

#### Carriers

As part of the work on template strings, Jim has put back some code that 
was originally written for the purpose of translating patterns, called 
"carriers".  There are methods / bootstraps that take a MethodType and 
return method handles to (a) encode values of those types into an opaque 
carrier object and (b) pull individual values out of a carrier.  This 
means that the choice of carrier object can be deferred to runtime, as 
long as both the bundling and unbundling methods handles agree on the 
carrier form.

The choice of carrier is largely a footprint/specificity tradeoff.  One 
could imagine a carrier class per shape, or a single carrier class that 
wraps an Object[], or caching some number of common shapes (three ints 
and two refs).  This sort of tuning should be separate from the protocol 
encoded in the bytecode of the pattern method and its clients.

The pattern matching runtime will provide some condy bootstraps which 
wrap the Carriers behavior.

Since at least some patterns are conditional, we have to have a way to 
encode failure into the protocol.  For a partial pattern, we can use a 
B2 carrier and use null to encode failure to match; for a total pattern, 
we can use a B3 carrier.

#### Proposed encoding

Earlier explorations did a lot of work to preserve the optimization that 
a match target can be its own carrier.  But further analysis reveals 
that the cost of doing so for other than records is pretty substantial 
and works against the model of a pattern declaration being an imperative 
body of code that runs at match time.  So for record patterns, we can 
"inline" them by using `instanceof` as the applicability test and 
accessors for extraction, and for all other patterns, go through the 
carrier runtime.

This allows us to encode pattern methods as

     Object name$mangle(ARGS)

and have the pattern method do the match and return a carrier (or null), 
using the carrier object that the carrier runtime associates with the 
pattern descriptor.  And clients can take apart the result again using 
the extraction logic that the carrier runtime associates with the 
pattern descriptor.

This also means that instance patterns "just work" because virtual 
dispatch selects the right implementation for us automatically, and all 
implementations that can be overrides will also implicitly agree on the 
encoding.

Because patterns are methods, we can take advantage of all the 
affordances of methods.  We can use access bits to control accessibility 
in the obvious way; we can use the attributes that carry annotations, 
method parameter metadata, and generics signatures to carry information 
about the pattern declaration and its parameters.  What's missing is a 
place to put metadata for the *bindings*, and to record the fact that 
this is a pattern implementation and not an ordinary method.  So, we add 
the following attribute on pattern methods:

     Pattern {
         u2 attr_name;
         u4 attr_length;
         u2 patternFlags; // bitmask
         u2 patternName;  // index of UTF8 constant
         u2 patternDescr; // index of MethodType (or alternately UTF8) 
constant
         u2 attributes_count;
         attribute_info attributes[attributes_count];
     }

This says that "this method is a pattern", reifies the name of the 
pattern (patternName), reifies the pattern descriptor (patternDescr) 
which encodes the types of the bindings as a method descriptor or 
MethodType, and has attributes which can carry annotations, parameter 
metadata, and signature metadata for the bindings.   The existing 
attributes (e.g. Signature, ParameterNames, RVAA) can be reused as is, 
with the interpretation that this is the signature (or names, or annos) 
of the *bindings*, not the input parameters.  Flags can carry things 
like "deconstructor pattern" or "partial pattern" as needed.

## Reflection

We already have a sensible base class in the reflection library for 
reflecting patterns: Executable.  All of the methods on Executable make 
sense for patterns, including Object as the return type.  If the pattern 
is reflectively invoked, it will return null (for no match) or an 
Object[]; this Object[] can be thought of as the boxing of the carrier.  
Since the method return type is Object, this is an entirely reasonable 
interpretation.

We need some additional methods to describe the bindings, so we would 
have a subtype of Executable for Pattern, with methods like 
getBindings(), getAnnotatedBindings(), getGenericBindings(), 
isDeconstructor(), isPartial(), etc.

## Summary

This design borrows from previous rounds, but makes a number of 
simplifications.

  - The bindings of a pattern are captured in a MethodType, called the 
_pattern descriptor_.  The parameters of the pattern descriptor are the 
types of the bindings; the return type is the minimal type that will 
match the pattern (but is not as important as the bindings.)
  - Patterns are translated as methods whose names are derived, 
deterministically, from the name of the pattern and the erasure of the 
pattern descriptor.  These are called pattern methods. Pattern methods 
take as parameters the input parameters of the pattern, and return Object.
  - The returned object is an opaque carrier.  Null means the pattern 
didn't match.  A non-null value is the carrier type (from the carrier 
runtime) which is derived from the pattern descriptor.
  - Pattern methods are not directly invocable from the source language; 
they are invoked indirectly through pattern matching, or reflection.
  - Generated code invokes the pattern method and interprets the 
returned value according to the protocol, using MHs from the pattern 
runtime to access the bindings.
  - Pattern methods have a Pattern attribute, which captures information 
about the pattern as a whole (is a total/partial, a deconstructor, etc) 
and parameter-related attributes which describe the bindings.
  - Patterns are reflected through a new subtype of Executable, which 
exposes new methods to reflect over bindings.
  - When invoking a pattern method reflectively, the carrier is boxed to 
an Object[].

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20220329/02aa033e/attachment-0001.htm>


More information about the amber-spec-experts mailing list