Exploring inference for sealed types

Tue Sep 24 18:34:34 UTC 2019

Thanks Gavin.  We had an internal discussion on this today, which I will 
summarize here to help illuminate the issue.

> As Brian mentioned in an earlier email, sealed types address two related, but
> distinct, issues: (1) declaring a sum type, whereby the compiler can exploit
> exhaustiveness in various places (e.g. in a switch); and (2) defining a type
> that for clients behaves as if it is final (it cannot be extended), but for
> the class author actually has a fixed, known collection of implementations.

It would only be a moderate exaggeration to say that this is really two 
features -- which brings us to the classic lump/split decision -- should 
we expose this as one feature, or two.  (In the past, Alan M proposed a 
splitting where the sum-hierarchy case looked more like an enum 
declaration, for example.)  We have been leaning to "one", but there is 
a risk that attempting to cover both cases makes each of them harder to 
understand.

> The second use dictates a subtle design constraint: a type that directly
> extends/implements a `sealed` type must be either `sealed`, `final` or
> `non-sealed`. If not, it will be too easy to create a security hole where the
> class author intended a class hierarchy to be closed, but by forgetting a
> modifier at a leaf type, inadvertently renders the hierarchy open.

This is our line in the sand; it would not be OK to have an arbitrary 
subtype `class X implements I { }` for some sealed I, and have X end up 
being open for extension.

> One valid design point is to stop here. All `sealed`/`non-sealed`/`final`
> modifiers and `permits` clauses have to be given explicitly. The compiler then
> just checks that what has been declared is correct.

Indeed, we could stop here; let's call this our baseline.  If we stopped 
here, we'd get the desired safety benefits, explicitness, and a 
reasonable lumping.  Let's be more explicit about why we might do more.

If we declare a sum hierarchy, there are three unfortunate bits of O(n) 
repetition:

     sealed interface X permits A, B, C {
         final class A implements X {}
         final class B implements X {}
         final class C implements X {}
}

These three bits are: listing the subtypes twice (once at the 
declaration, once at the permits clause); saying "implements X" 
repeatedly; and saying "final" repeatedly.  (In the event the subtypes 
are records, the last is automatically taken care of.)

> We have been exploring some alternative design points, all supporting some
> sort of inference.

The inference scheme Gavin proposed addresses the first and last of 
these, at some cost (both to perceived complexity and implementation.)  
Let's drill into why we proposed this in the first place.

The case I have in mind -- which I believe will be quite common -- is a 
flat hierarchy (one sealed supertype, N direct subtypes) with a 
relatively high degree of fan-out.  (This shows up in all sorts of 
document tree representations.)  And of the three repetitions, my claim 
is the most irritating is the permits clause.  Imagine the above 
hierarchy fanned out to A-Z; there's a 26-way permits clause that is 
both annoying to write and not particularly enlightening to read (and as 
a bonus, error-prone to update.)  If we're going to infer anything, we 
should start here.

So one sensible increment atop the baseline is: (A) if a top-level type 
is explicitly declared sealed, and has no permits clause, we can infer 
the permits clause from the subtypes in the compilation unit (or more 
narrowly, the _nested_ subtypes).  This is simpler than the full scheme 
outlined, while addressing the biggest case that concerns me -- the 
high-fanout case.

The above scheme could be incrementally extended to (B) any class 
explicitly marked `sealed` -- if you say sealed and leave off `permits`, 
the permits list is inferred from the current compilation unit.  This 
seems defensible.

Where we end up with confusing action-at-a-distance is to infer finality 
/ sealed-ness for subtypes.  We could back off from this completely, or 
we could take a simpler projection, which is (C) to say any direct 
subtype of a sealed type is implicitly final, unless it explicitly says 
`sealed` or `non-sealed`.

So, if we're lumping, we could choose "baseline", or "baseline + A", or 
"baseline + A + B", or "baseline + A + B + C".  All seem defensible, 
though baseline-only seems likely to provide some ongoing irritation for 
the permits clause.

If we go the split direction, we have more choices, but having chased a 
few of these down, they all seem to arrive at muddy places.  Alan's 
`enum class` approach looks clean when the components of the sum are 
simple records, but when they are more complex classes, the expression 
starts to get ugly.

Another direction is to borrow the terminology (but not the semantics) 
from `case` classes in Scala, where we explicitly mark the subtypes, 
which has the effect of turning on all the complex inference, but at 
least making it less magic:

     sealed interface I {
         case class A { ... }
         case class B { ... }
         case class C { ... }
     }

where we'd infer the proper sealed-ness/finality, permits clauses, and 
implements clauses.  But, I think when we get beyond toy examples, this 
approach feels unlikely to offer sufficient benefit to justify itself, 
and the toy examples work reasonably well already (since records are 
already final.)

So my suggestion is to start with Baseline + (A | A&B), limiting 
inference to permits clauses, and see if that is enough.