Relaxed assignment conversions for sealed types

Sat Jan 23 04:54:52 UTC 2021

On Jan 22, 2021, at 9:27 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> I want to pick up on this thread, now that we may have some more fire to add to it, and we've all had some time to think about it.  
> 
>  - The main technical debt that we piled up when doing switch expressions is that switch expressions must be total, whereas switch statements are not.  There's nothing wrong with partial switch statements, but there's a middle case where we have a switch statement that we _intend to be_ total, and it would surely be nice to get the compiler to type-check that assumption for us.  

Where is the current working definition of when a series of patterns is a candidate for being marked as total?  (“Totalized”?)

I suppose it depends on the type T of the match target X, maybe the nullity kind of X, and then whether the series of patterns P somehow “covers all possibilities” of T, in some way that depends on T, and also requires a static approximation to T’s eventual runtime definition.

If T is an enum, then the patterns P must exhaust T’s members.

If T is a sealed hierarchy of some sort (perhaps only a one-class hierarchy) then the patterns P must exhaust T’s subtype(s). 

Separately, a nullable target X can always be totalized to a null-hostile group of patterns.  (Is this the same operation as the other totalizations?)

In all of this I’d like to say “pattern”, not “set of patterns”, is the right place to consider totalization, because I think the interesting primitive notion here is a single pattern P, viewed in the context of its target X of type T, *and optionally* some set of previous patterns Q from X’s value is excluded before P is reached.

The benefit of totalizing a pattern P is this:  If the static compiler decides that P is total, then it can exclude “side paths” near P.  By side path I mean any control flow that might occur subsequent to P’s failure.  Such paths, possible in a program where P is not total, are impossible when P is total.  They are dead; they don’t contribute to DA/DU rules, etc.  Removing a side path statically might require some quiet housekeeping:  If the runtime conditions somehow contrive for a side path to be taken after all, the compiler needs to arrange for a RuntimeException or Error.  But it’s a worthy cause, since manual managing of such errors can make code unreadable.  Imagine, for a moment, if Java had only “X instanceof T” and not also “(T)X”.  You could have the same programs, but they would have to have explicit side paths for all failures of X to be T.

Under this view maybe we could have a common story for:

switch (X) { case Q: … case P: … }
switch (foo) { case Foo(Q): … case Foo(P): … } //X is Foo component
   // note:  there might also be Bar(Q), Bar(P), Bar(Baz) mixed in here
if (X instanceof Q) … else if (X instanceof P) … // dead after if #2???
if (X instanceof Q) … else { let P y = X; }
if (X instanceof Foo(Q)) … else { let Foo(P) y = foo; }

// cases without Q:
switch (X) { case P: … }
switch (foo) { case Foo(P): … }
if (X instanceof P) ... // dead after if???
let P = X;
let Foo(P) = foo;

In all of these cases, it seems to me that there are three different things that might happen with the pattern P:

(a) The compiler can determine automatically that there P (after any supporting Qs) is total (“covers all of T”).

(b) The compiler must provide for code paths where some values of X that do not match P (or any Q).

(c) The user provides a mark on P which asks the compiler to discard any code paths that might arise from case (b), and thus treat the code as if P were total as in case (a).

When there are Q’s in the picture, I think they should be viewed as secondary to the final decision about P; declaring P total is equivalent to declaring all of the Q’s plus P as total.  That’s why I think the decisive mark (in case (c)) is on P, and not on some other contextual bit of syntax.

For switch, this is unpleasantly asymmetrical, so maybe moving the totalization marker (“total”?) up before “switch” is a win.  Still, it seems worth thinking of “total-switch (X) { … case P: ...}” as optional sugar for “switch (X) { … case total P: …}”, rather than making different rules for different syntactic contexts of patterns.

Revisiting the above examples, using a silly hyphenated keyword “assert-true”:

switch (X) { case Q: … case assert-true P: … }
switch (foo) { case Foo(Q): … case Foo(assert-true P): …} //X is Foo component
if (X instanceof Q) … else if (X instanceof assert-true P) … // dead after if #2???
if (X instanceof Q) … else { let assert-true P y = X; }
if (X instanceof Foo(Q)) … else { let Foo(assert-true P) y = foo; }

// cases without Q:
switch (X) { case assert-true P: … }
switch (foo) { case Foo(assert-true P): … }
if (X instanceof assert-true P) ... // dead after if???
let assert-true P = X;
let Foo(assert-true P) = foo;

> 
>  - Separately, we have cases where we know in our hearts that a blind cast will work, but have no way to tell that to the compiler.

Don’t we do that today, by simply issuing the blind cast?  In other words, “(P)X” implies the intention to cut off all side paths, as if “(assert-true P)X” were given, with the programmer intention more clear.

Actually, “(P)X” feeds through the null case (X==null) to the main path, where sometimes we wish to treat
 it as a side path to be cut off.  Possibly “(assert-true P)X” is a reasonable way to ask the compiler to cut off the side path for null.  One possible rationale:  “(T)X” is a legacy cast which is null-friendly.  “(P)X”, where P is surely a pattern, is null-hostile (if and only if P is, separate question of when that is).  Since “(assert-true P)” is the assertedly-totalizing use of P, it follows that null is a side path to be excluded.  (I suppose I’m just rediscovering one of the nuggets you have proposed here, Brian.)

But, there are lots of places where we “know in our hearts” things that the compiler doesn’t know.  The present set of features is a very small subset:  We “know in our hearts” that some pattern P “covers all possibilities” of the type of some nearby X, according to a very short list of what “covers all possibilities” might mean.

The normal syntax for telling a program something that we “know in our hearts” is “assert p”.  Since this is not control flow, it cannot affect liveness, and it certainly doesn’t affect the types of things.  And in exchange for having no impact on the static model of the program, it can propose and enforce arbitrarily complex runtime conditions.

I wonder if there is a useful middle ground between “assert” (no static effect but arbitrary testing) and the proposed set of pattern totalizations, which affect liveness but can test only a small number of ad hoc edge conditions.  If there were, I suppose an assert-like syntax could require that a pattern must always match, and/or some other condition always hold, with two consequences:

1. The compiler arranges that any “side paths” around the assert are treated as dead.

2. The compiler arranges a runtime test with a standard exception if those side paths are taken at runtime.

A non-pattern example could be:

if ((assert-true) some(hairy,computation)) … // dead after if!!

And in that case we could unify the following two manifestations of totalized patterns:

if (X instanceof assert-true P) … // dead after if!!
if ((assert-true)( X instanceof P )) … // dead after if!!

If all that could work, then perhaps the totalizing patterns would “fall out” more easily from the design.

OTOH, I could be barking up the wrong tree here, because this feature I just sketched is arguably too strong:  If it can force the compiler to create dead paths (say) for unsealed hierarchies, then we would lose some of the advantages of the present proposals.  The user says “I’m going to declare this total, watch my back”, and expects the compiler to say, “whoops, that can’t possibly be total, because you added another subtype”.  Under my “stronger totality” thought experiment here, the compiler would say “yes, master” and the user would never get a warning or error.  But maybe there’s a way to square up those conflicting requirements?

> For example:
> 
>     sealed interface A permits B { }
>     sealed class B implements A { }
>     ...
>     A a = ...
>     B b = (B) a;
> 
> In this code, our A cannot be anything but a B; there is only one concrete class in the hierarchy.  We have talked about allowing this without a cast (which I think is OK, and DanH agrees with below, but is not my main point here), but the cast here to B is a lot like a switch that I intend to be total, but for which the compiler does not type check this assumption.  It would be nice to get some type checking behind it, in case, say, someone else adds another subtype of A.  

I guess I’m suggesting that the “real primitive” here is for a programmer to be able to say, “this operation completes with no side paths”, and requests “please arrange to clean up my mess if a side path is taken at runtime”.

A cast-free assignment has the advantage of mentioning just one type.  A cast-free parameter pass has the advantage of mentioning no types at all.  They can be expressed as a marker which says, “I know the target type here, and I know the operand is a little different, but I’m telling you to cut off the side path.”

Object x = …;  //I know in my heart this is a String!
assert-true String s = x;  //bad keyword
String s = (assert-converts) x;  //another bad keyword

// String::concat(String) is a method taking String not Object
System.out.println("hello ".concat( (assert-converts) x ));

> 
>  - Separately separately, in the discussion about nulls in patterns, there is some concern that it is not always obvious when a nested pattern is total.  Again, we have a case where there might have been some design intent, but it is (a) not type-checked and (b) not obvious to readers of the code, and it might be nice to capture this intent.  
> 
> I think these things are the same feature.  So what if:
> 
>     total-switch (x) { ... }  // or switch-total
> 
> meant "I assert this switch is total, please error if it is not."  And
> 
>     b = (total B) a;
> 
> meant "I assert this case is total, please error if not."  And
> 
>     case Foo(total Bar x)
> 
> meant "I assert that this nested pattern is total, please error if not."  

(See above; I took these cases into account in my response.)

I hope this helps.  I’m doing my favorite move here, called “overgeneralize and see what pops out”.  Hopefully there is a right-sized generalization here, with fewer and simpler primitives.