Deconstructor (and pattern) overload selection

Mon Apr 1 16:34:49 UTC 2024

The next big pattern matching JEP will be about deconstruction 
patterns.  (Static and instance patterns will likely come separately.)  
Now that we've got the bikeshed painting underway, there are a few other 
loose ends here, and one of them is overload selection.

We explored taking the existing overload selection algorithm and turning 
it inside out, but after going down that road a bit, I think this both 
unnecessarily much complexity for not enough value, and also potentially 
fraught with nasty corner cases.  I think there is a much simpler answer 
here which is entirely good enough.

First, let's remind ourselves, why do we have constructor overloading in 
the first place?  There are three main reasons:

  - Concision.  If a fully-general constructor takes many parameters, 
but not all are essential to the use case, then the construction site 
becomes a site of accidental complexity.  Being able to handle common 
grouping of parameters simplifies use sites.

  - Flexibility.  Related to the above, not only might the user not need 
to specify a given constructor parameter, but they want the flexibility 
of saying "let the implementation pick the best value".  Constructors 
with fewer parameters reserve more flexibility for the implementation.

  - Alternative representations.  Some objects may take multiple 
representations as input, such as accepting a Date, a LocalDate, or a 
LocalDateTime.

The first two cases are generally handled with "telescoping constructor 
nests", where we have:

     Foo(A a)
     Foo(A a, B b)
     Foo(A a, B b, C d, D d)

Sometimes the telescopes don't fold perfectly, and becomes "trees":

     Foo(A a)
     Foo(A a, B b)
     Foo(A a, C c, D d)
     Foo(A a, B b, C d, D d)

Which constructors to include are subjective judgments on the part of 
class authors to find good tradeoffs between code size and 
concision/flexibility.

We had initially assumed that each constructor overload would have a 
corresponding deconstructor, but further experimentation suggests this 
is not an ideal assumption.

Clue One that it is not a good assumption comes from the asymmetry 
between constructors and deconstructors; if we have constructors and 
deconstructors of shape C(List), then it is OK to invoke C's constructor 
with List or its subtypes, but we can invoke C's deconstructor with List 
or its subtypes or its supertypes.

Clue Two is that applicability for constructors is based on method 
invocation context, but applicability for deconstructors is based on 
cast context, which has different rules.  It seems unlikely that we will 
ever get symmetry given this.

The "Flexibility" requirement does not really apply to deconstructors; 
having a deconstructor that accepts additional bindings does not 
constrain anything, not in the same way as a constructor taking 
needlessly specific arguments.  Imagine if ArrayList had only 
constructors that take int (for array capacity); this is terrible for 
the constructor, because it forces a resource management decision onto 
users who will not likely make a very good decision, and one that is 
hard to change later, but pretty much harmless for deconstructors.

The "Concision" requirement does not really apply as much to 
deconstructors as constructors; matching with `Foo(var a, _, _)` is not 
nearly as painful as invoking with lots of parameters, each of which 
require an explicit choice by the user.

So the main reason for overloading deconstructors is to match 
representations with the constructor overloads -- but with a given 
"representation set", there probably does not need to be as many 
deconstructors as constructors. What we really need is to match the 
"maximal" constructor in a telescoping nest with a corresponding 
deconstructor, or for a tree-shaped set, one for each "maximal" 
representation.

So for a class with constructors

     Foo()
     Foo(A a)
     Foo(A a, B B)
     Foo(X x)
     Foo(X x, Y y)

we would want dtors for (A,B) and (X,Y), but don't really need the others.

So, let's start fresh on overload selection. Deconstructors have a set 
of applicability rules based on arity first (eventually, varargs, but 
not yet) and then on applicability of type patterns, which is in turn 
rooted in castability.  Because we don't have the compatibility problem 
introduced by autoboxing, we can ignore the distinction between phase 1 
and 2 of overload selection (we will have this problem with varargs 
later, though.)

Given this, the main question we have to resolve is to what degree -- if 
any -- we may deem one overload "more applicable" than others.  I think 
there is one rule here that is forced: an exact type match (modulo 
erasure) is more applicable than an inexact type match.  So given:

     D(Object o)
     D(String s)

then

     case D(String s)

should choose the latter.  This allows the client to (mostly) steer to a 
specific overload just by using the right types (rather than `var` or a 
subtype.)  It is not clear to me whether we need anything more here; in 
the event of ambiguity, a client can pick the right overload with the 
right type patterns.  (Nested patterns may need to be manually unrolled 
to subsequent clauses in some cases.)

So basically (on a per-binding basis): an exact match is more applicable 
than an inexact match, and ... that's it. Users can steer towards a 
particular overload by selecting exact matches on enough bindings.  
Libraries can provide their own "joins" if they want to disambiguate 
problematic overloads like:

     D(Object o, String s)
     D(String s, Object o)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20240401/b387da71/attachment-0001.htm>