Exploring inference for sealed types
Brian Goetz
brian.goetz at oracle.com
Tue Sep 24 18:34:34 UTC 2019
Thanks Gavin. We had an internal discussion on this today, which I will
summarize here to help illuminate the issue.
> As Brian mentioned in an earlier email, sealed types address two related, but
> distinct, issues: (1) declaring a sum type, whereby the compiler can exploit
> exhaustiveness in various places (e.g. in a switch); and (2) defining a type
> that for clients behaves as if it is final (it cannot be extended), but for
> the class author actually has a fixed, known collection of implementations.
It would only be a moderate exaggeration to say that this is really two
features -- which brings us to the classic lump/split decision -- should
we expose this as one feature, or two. (In the past, Alan M proposed a
splitting where the sum-hierarchy case looked more like an enum
declaration, for example.) We have been leaning to "one", but there is
a risk that attempting to cover both cases makes each of them harder to
understand.
> The second use dictates a subtle design constraint: a type that directly
> extends/implements a `sealed` type must be either `sealed`, `final` or
> `non-sealed`. If not, it will be too easy to create a security hole where the
> class author intended a class hierarchy to be closed, but by forgetting a
> modifier at a leaf type, inadvertently renders the hierarchy open.
This is our line in the sand; it would not be OK to have an arbitrary
subtype `class X implements I { }` for some sealed I, and have X end up
being open for extension.
> One valid design point is to stop here. All `sealed`/`non-sealed`/`final`
> modifiers and `permits` clauses have to be given explicitly. The compiler then
> just checks that what has been declared is correct.
Indeed, we could stop here; let's call this our baseline. If we stopped
here, we'd get the desired safety benefits, explicitness, and a
reasonable lumping. Let's be more explicit about why we might do more.
If we declare a sum hierarchy, there are three unfortunate bits of O(n)
repetition:
sealed interface X permits A, B, C {
final class A implements X {}
final class B implements X {}
final class C implements X {}
}
These three bits are: listing the subtypes twice (once at the
declaration, once at the permits clause); saying "implements X"
repeatedly; and saying "final" repeatedly. (In the event the subtypes
are records, the last is automatically taken care of.)
> We have been exploring some alternative design points, all supporting some
> sort of inference.
The inference scheme Gavin proposed addresses the first and last of
these, at some cost (both to perceived complexity and implementation.)
Let's drill into why we proposed this in the first place.
The case I have in mind -- which I believe will be quite common -- is a
flat hierarchy (one sealed supertype, N direct subtypes) with a
relatively high degree of fan-out. (This shows up in all sorts of
document tree representations.) And of the three repetitions, my claim
is the most irritating is the permits clause. Imagine the above
hierarchy fanned out to A-Z; there's a 26-way permits clause that is
both annoying to write and not particularly enlightening to read (and as
a bonus, error-prone to update.) If we're going to infer anything, we
should start here.
So one sensible increment atop the baseline is: (A) if a top-level type
is explicitly declared sealed, and has no permits clause, we can infer
the permits clause from the subtypes in the compilation unit (or more
narrowly, the _nested_ subtypes). This is simpler than the full scheme
outlined, while addressing the biggest case that concerns me -- the
high-fanout case.
The above scheme could be incrementally extended to (B) any class
explicitly marked `sealed` -- if you say sealed and leave off `permits`,
the permits list is inferred from the current compilation unit. This
seems defensible.
Where we end up with confusing action-at-a-distance is to infer finality
/ sealed-ness for subtypes. We could back off from this completely, or
we could take a simpler projection, which is (C) to say any direct
subtype of a sealed type is implicitly final, unless it explicitly says
`sealed` or `non-sealed`.
So, if we're lumping, we could choose "baseline", or "baseline + A", or
"baseline + A + B", or "baseline + A + B + C". All seem defensible,
though baseline-only seems likely to provide some ongoing irritation for
the permits clause.
If we go the split direction, we have more choices, but having chased a
few of these down, they all seem to arrive at muddy places. Alan's
`enum class` approach looks clean when the components of the sum are
simple records, but when they are more complex classes, the expression
starts to get ugly.
Another direction is to borrow the terminology (but not the semantics)
from `case` classes in Scala, where we explicitly mark the subtypes,
which has the effect of turning on all the complex inference, but at
least making it less magic:
sealed interface I {
case class A { ... }
case class B { ... }
case class C { ... }
}
where we'd infer the proper sealed-ness/finality, permits clauses, and
implements clauses. But, I think when we get beyond toy examples, this
approach feels unlikely to offer sufficient benefit to justify itself,
and the toy examples work reasonably well already (since records are
already final.)
So my suggestion is to start with Baseline + (A | A&B), limiting
inference to permits clauses, and see if that is enough.
More information about the amber-spec-experts
mailing list