Declared patterns -- translation and reflection
Brian Goetz
brian.goetz at oracle.com
Tue Mar 29 21:01:18 UTC 2022
Time to take a peek ahead at _declared patterns_. Declared patterns
come in three varieties -- deconstruction patterns, static patterns, and
instance patterns (corresponding to constructors, static methods, and
instance methods.) I'm going to start with deconstruction patterns, but
the basic game is the same for all three.
Ignoring the trivial details, a deconstruction pattern looks like a
"constructor in reverse":
```{.java}
class Point {
int x, y;
Point(int x, int y) {
this.x = x;
this.y = y;
}
deconstructor(int x, int y) {
x = this.x;
y = this.y;
}
}
```
Deconstruction patterns share the weird behaviors that constructors have
in that they are instance members, but are not inherited, and that
rather having names, they are accessed via the class name.
Deconstruction patterns differ from static/instance patterns in that
they are by definition total; they cannot fail to match. (This is a
somewhat arbitrary simplification in the object model, but a reasonable
one.) They also cannot have any input parameters, other than the receiver.
Patterns differ from their ctor/method counterparts in that they have
what appear to be _two_ argument lists; a parameter list (like ctors and
methods), and a _binding_ list. The parameter list is often empty (with
the receiver as the match target). The binding list can be thought of as
a "conditional multiple return". That they may return multiple values
(and, for partial patterns, can return no values at all when they don't
match) presents a challenge for translation to classfiles, and for the
reflection model.
#### Translation to methods
Patterns contain imperative code, so surely we want to translate them to
methods in some way. The pattern input parameters map cleanly to method
parameters.
The pattern bindings need to tunneled, somehow, through the method
return (or some other mechanism). For our deconstructor, we might
translate as:
PatternCarrier <dtor>()
(where the method applies the pattern, and PatternCarrier wraps and
provides access to the bindings) or
PatternObject <dtor>()
(where PatternObject provides indirection to behavior to invoke the
pattern, which in turn returns the carrier.)
With either of these approaches, though, the pattern name is a problem,
because patterns can be overloaded on their _bindings_, but both of
these return types are insensitive to bindings.
It is useful to characterize the "shape" of a pattern with a MethodType,
where the parameters of the MethodType are the binding types. (The
return type is less constrained, but it is sometimes useful to use the
return type of the MethodType for the required type of the pattern.)
Call this the "descriptor" of the pattern.
If we do this, we can use some name mangling to encode the descriptor in
the method name:
PatternCarrier name$mangle()
The mangling has to be stable across compilations with respect to any
source- and binary-compatible changes to the pattern declaration. One
mangling that works quite well is to use the "symbolic-freedom encoding"
of the erasure of the pattern descriptor. Because the erasure of the
descriptor is exactly as stable as any other method signature derived
from source declarations, it will have the desired binary compatibility
properties, overriding will work as expected, etc.
#### Return value
In an earlier design, we used a pattern object (which was a bundle of
method handles) as the return value of the pattern. This enabled clients
to invoke these via condy and bind method handles into the constant pool
for deconstruction and static patterns.
Either way, we make use of some sort of carrier object to carry the
bindings from the pattern to the client; either we return the carrier
from the pattern method, or there is a method on the pattern object that
we invoke to get a carrier. We have a few preferences about the
carrier; we'd like to be able to late-bind to the actual implementation
(i.e., we don't want to freeze the name of a carrier class in the method
descriptor), and at least for records, we'd like to let the record
instance itself be the carrier (since it is immutable and we can just
invoke the accessors to get the bindings.)
#### Carriers
As part of the work on template strings, Jim has put back some code that
was originally written for the purpose of translating patterns, called
"carriers". There are methods / bootstraps that take a MethodType and
return method handles to (a) encode values of those types into an opaque
carrier object and (b) pull individual values out of a carrier. This
means that the choice of carrier object can be deferred to runtime, as
long as both the bundling and unbundling methods handles agree on the
carrier form.
The choice of carrier is largely a footprint/specificity tradeoff. One
could imagine a carrier class per shape, or a single carrier class that
wraps an Object[], or caching some number of common shapes (three ints
and two refs). This sort of tuning should be separate from the protocol
encoded in the bytecode of the pattern method and its clients.
The pattern matching runtime will provide some condy bootstraps which
wrap the Carriers behavior.
Since at least some patterns are conditional, we have to have a way to
encode failure into the protocol. For a partial pattern, we can use a
B2 carrier and use null to encode failure to match; for a total pattern,
we can use a B3 carrier.
#### Proposed encoding
Earlier explorations did a lot of work to preserve the optimization that
a match target can be its own carrier. But further analysis reveals
that the cost of doing so for other than records is pretty substantial
and works against the model of a pattern declaration being an imperative
body of code that runs at match time. So for record patterns, we can
"inline" them by using `instanceof` as the applicability test and
accessors for extraction, and for all other patterns, go through the
carrier runtime.
This allows us to encode pattern methods as
Object name$mangle(ARGS)
and have the pattern method do the match and return a carrier (or null),
using the carrier object that the carrier runtime associates with the
pattern descriptor. And clients can take apart the result again using
the extraction logic that the carrier runtime associates with the
pattern descriptor.
This also means that instance patterns "just work" because virtual
dispatch selects the right implementation for us automatically, and all
implementations that can be overrides will also implicitly agree on the
encoding.
Because patterns are methods, we can take advantage of all the
affordances of methods. We can use access bits to control accessibility
in the obvious way; we can use the attributes that carry annotations,
method parameter metadata, and generics signatures to carry information
about the pattern declaration and its parameters. What's missing is a
place to put metadata for the *bindings*, and to record the fact that
this is a pattern implementation and not an ordinary method. So, we add
the following attribute on pattern methods:
Pattern {
u2 attr_name;
u4 attr_length;
u2 patternFlags; // bitmask
u2 patternName; // index of UTF8 constant
u2 patternDescr; // index of MethodType (or alternately UTF8)
constant
u2 attributes_count;
attribute_info attributes[attributes_count];
}
This says that "this method is a pattern", reifies the name of the
pattern (patternName), reifies the pattern descriptor (patternDescr)
which encodes the types of the bindings as a method descriptor or
MethodType, and has attributes which can carry annotations, parameter
metadata, and signature metadata for the bindings. The existing
attributes (e.g. Signature, ParameterNames, RVAA) can be reused as is,
with the interpretation that this is the signature (or names, or annos)
of the *bindings*, not the input parameters. Flags can carry things
like "deconstructor pattern" or "partial pattern" as needed.
## Reflection
We already have a sensible base class in the reflection library for
reflecting patterns: Executable. All of the methods on Executable make
sense for patterns, including Object as the return type. If the pattern
is reflectively invoked, it will return null (for no match) or an
Object[]; this Object[] can be thought of as the boxing of the carrier.
Since the method return type is Object, this is an entirely reasonable
interpretation.
We need some additional methods to describe the bindings, so we would
have a subtype of Executable for Pattern, with methods like
getBindings(), getAnnotatedBindings(), getGenericBindings(),
isDeconstructor(), isPartial(), etc.
## Summary
This design borrows from previous rounds, but makes a number of
simplifications.
- The bindings of a pattern are captured in a MethodType, called the
_pattern descriptor_. The parameters of the pattern descriptor are the
types of the bindings; the return type is the minimal type that will
match the pattern (but is not as important as the bindings.)
- Patterns are translated as methods whose names are derived,
deterministically, from the name of the pattern and the erasure of the
pattern descriptor. These are called pattern methods. Pattern methods
take as parameters the input parameters of the pattern, and return Object.
- The returned object is an opaque carrier. Null means the pattern
didn't match. A non-null value is the carrier type (from the carrier
runtime) which is derived from the pattern descriptor.
- Pattern methods are not directly invocable from the source language;
they are invoked indirectly through pattern matching, or reflection.
- Generated code invokes the pattern method and interprets the
returned value according to the protocol, using MHs from the pattern
runtime to access the bindings.
- Pattern methods have a Pattern attribute, which captures information
about the pattern as a whole (is a total/partial, a deconstructor, etc)
and parameter-related attributes which describe the bindings.
- Patterns are reflected through a new subtype of Executable, which
exposes new methods to reflect over bindings.
- When invoking a pattern method reflectively, the carrier is boxed to
an Object[].
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20220329/02aa033e/attachment-0001.htm>
More information about the amber-spec-experts
mailing list