Deconstructor (and pattern) overload selection
Brian Goetz
brian.goetz at oracle.com
Mon Apr 1 16:34:49 UTC 2024
The next big pattern matching JEP will be about deconstruction
patterns. (Static and instance patterns will likely come separately.)
Now that we've got the bikeshed painting underway, there are a few other
loose ends here, and one of them is overload selection.
We explored taking the existing overload selection algorithm and turning
it inside out, but after going down that road a bit, I think this both
unnecessarily much complexity for not enough value, and also potentially
fraught with nasty corner cases. I think there is a much simpler answer
here which is entirely good enough.
First, let's remind ourselves, why do we have constructor overloading in
the first place? There are three main reasons:
- Concision. If a fully-general constructor takes many parameters,
but not all are essential to the use case, then the construction site
becomes a site of accidental complexity. Being able to handle common
grouping of parameters simplifies use sites.
- Flexibility. Related to the above, not only might the user not need
to specify a given constructor parameter, but they want the flexibility
of saying "let the implementation pick the best value". Constructors
with fewer parameters reserve more flexibility for the implementation.
- Alternative representations. Some objects may take multiple
representations as input, such as accepting a Date, a LocalDate, or a
LocalDateTime.
The first two cases are generally handled with "telescoping constructor
nests", where we have:
Foo(A a)
Foo(A a, B b)
Foo(A a, B b, C d, D d)
Sometimes the telescopes don't fold perfectly, and becomes "trees":
Foo(A a)
Foo(A a, B b)
Foo(A a, C c, D d)
Foo(A a, B b, C d, D d)
Which constructors to include are subjective judgments on the part of
class authors to find good tradeoffs between code size and
concision/flexibility.
We had initially assumed that each constructor overload would have a
corresponding deconstructor, but further experimentation suggests this
is not an ideal assumption.
Clue One that it is not a good assumption comes from the asymmetry
between constructors and deconstructors; if we have constructors and
deconstructors of shape C(List), then it is OK to invoke C's constructor
with List or its subtypes, but we can invoke C's deconstructor with List
or its subtypes or its supertypes.
Clue Two is that applicability for constructors is based on method
invocation context, but applicability for deconstructors is based on
cast context, which has different rules. It seems unlikely that we will
ever get symmetry given this.
The "Flexibility" requirement does not really apply to deconstructors;
having a deconstructor that accepts additional bindings does not
constrain anything, not in the same way as a constructor taking
needlessly specific arguments. Imagine if ArrayList had only
constructors that take int (for array capacity); this is terrible for
the constructor, because it forces a resource management decision onto
users who will not likely make a very good decision, and one that is
hard to change later, but pretty much harmless for deconstructors.
The "Concision" requirement does not really apply as much to
deconstructors as constructors; matching with `Foo(var a, _, _)` is not
nearly as painful as invoking with lots of parameters, each of which
require an explicit choice by the user.
So the main reason for overloading deconstructors is to match
representations with the constructor overloads -- but with a given
"representation set", there probably does not need to be as many
deconstructors as constructors. What we really need is to match the
"maximal" constructor in a telescoping nest with a corresponding
deconstructor, or for a tree-shaped set, one for each "maximal"
representation.
So for a class with constructors
Foo()
Foo(A a)
Foo(A a, B B)
Foo(X x)
Foo(X x, Y y)
we would want dtors for (A,B) and (X,Y), but don't really need the others.
So, let's start fresh on overload selection. Deconstructors have a set
of applicability rules based on arity first (eventually, varargs, but
not yet) and then on applicability of type patterns, which is in turn
rooted in castability. Because we don't have the compatibility problem
introduced by autoboxing, we can ignore the distinction between phase 1
and 2 of overload selection (we will have this problem with varargs
later, though.)
Given this, the main question we have to resolve is to what degree -- if
any -- we may deem one overload "more applicable" than others. I think
there is one rule here that is forced: an exact type match (modulo
erasure) is more applicable than an inexact type match. So given:
D(Object o)
D(String s)
then
case D(String s)
should choose the latter. This allows the client to (mostly) steer to a
specific overload just by using the right types (rather than `var` or a
subtype.) It is not clear to me whether we need anything more here; in
the event of ambiguity, a client can pick the right overload with the
right type patterns. (Nested patterns may need to be manually unrolled
to subsequent clauses in some cases.)
So basically (on a per-binding basis): an exact match is more applicable
than an inexact match, and ... that's it. Users can steer towards a
particular overload by selecting exact matches on enough bindings.
Libraries can provide their own "joins" if they want to disambiguate
problematic overloads like:
D(Object o, String s)
D(String s, Object o)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-experts/attachments/20240401/b387da71/attachment.htm>
More information about the amber-spec-experts
mailing list