Relaxed assignment conversions for sealed types

Sun Oct 25 14:06:42 UTC 2020

You are right to be a little scared.  I'm a little scared too.

Many language features are proposed on the basis of "wouldn't it be cool 
if we could write this!"   This is not one of those cases.  The 
motivation here is purely: wouldn't people write _safer_ code with this.

The motivation came out of observing how sealed classes are likely to be 
used.  I was working on a library, which intends to make heavy use of 
sealing, where it exposes public interfaces and has a single private 
implementation for almost all of them.  The implementations are strongly 
coupled, they want freely access each other.

What I found was that it was _overwhelmingly_ tempting -- and I'm pretty 
disciplined -- to do things like:

     void foo(Bar b) {
         BarImpl bi = (BarImpl) b;
         bi.internalGoop();
     }

instead of:

     void foo(Bar b) {
         if (b instanceof BarImpl bi) {
             bi.internalGoop();
        }
        else
             throw new IllegalStateException(blah blah);
     }

This is just like the do-nothing default clause, if not worse.

My concern is NOT that I don't want to write the cast in the first 
example.  It's that that correctness of the cast relies on an assumption 
that the compiler can't type check.  So we leave users with some bad 
choices:

  - Write the full-blown if-else version.  Which will annoy the users 
when they do, but more likely, they just won't do it.
  - Write the cast version.  Now our code is embedded with a hidden 
assumption, which could be later broken, and the compiler can't type 
check it for us.

To make it clear that I'm not talking about the annoyance of typing the 
cast, let's pretend I'm suggesting to write it like this:

     BarImpl bi = (__static BarImpl) b;

which means "if this cast can't be statically proven true, give me a 
compile error."  I think we can agree that this version is strictly 
better than the first version (same runtime semantics, but more static 
type checking), and probably better than the second version too.  We can 
debate the syntax, but this is the point I want to discuss -- that we 
can make the language _safer_ by exploiting information the compiler 
has, as long as we don't sweep it under the rug.

We want people to write APIs like this -- where the public face is 
abstract interfaces, but the private face can be efficient, encapsulated 
code.  And we also want to provide the maximum type checking we can for 
that code.

(Digression: In the future, people might have a slightly better option 
than the if-else:

     switch (b) {
         case BarImpl bi -> { rest of method }
     }

which will benefit from the exhaustive type checking.  But this will 
require patterns in switch (coming) and, for statement switches, some 
sort of totality opt in.  End digression.)

Let's try and separate the "oooh, scary and different" reactions here 
from the problem.  I think this is a problem that we should try to 
solve, and we _have the information in hand_ to solve it.  What remains 
is mostly a user-psychology exercise of "what is the right level of 
implicit/explicit here."

Some comments inline.

> I'm a little bit scared of introducing this kind of implicit cast into
> the assignment operation. This looks like the right operand in
> MyFooImpl mfi = f; becomes a poly-expression, as the type of the f
> expression now depends on the mfi type. How will it interact with
> other constructs? E.g. can we write MyFooImpl getFooImpl(MyFoo f)
> {return f;}? Or aMethodAcceptingFooImpl(f)?

As I sketched it, yes; this would be an assignability thing, where Foo 
is assignable to FooImpl by virtue of sealing.  (Again, if sealed types 
were true union types, this would already be true by virtue of the 
definition of subtyping for a union type.)

> What about something that
> requires more inference, like Function<MyFoo, MyFooImpl> = f -> f; ?

Is this any different from:

     Function<Integer, Number> = n -> n;

We rewrite with inference variables:

     Function<Integer, Number> = (\alpha n) -> n;

and gather constraints that says we can assign Integer to \alpha and 
assign \alpha to Number, and inference says "OK, Integer."

> If we want a compile-time check that 'this cast is safe as there's
> exactly one implementor', I think this should be done in a more
> explicit way. Btw we already designed such a way:
> MyFooImpl mfi = switch(f) {case MyFooImpl _mfi -> _mfi;};
> This would do the thing, though may look somewhat ugly. We can add
> some sugar if anybody doesn't like this syntax. Probably like
> MyFooImpl mfi = (safe-cast-to MyFooImpl)f;
> But my point is that this kind of downcast should be explicit.

As mentioned above, I'm open to that.  I think the switch route is a 
little too indirect; it either won't occur to users, or it will still be 
too many characters to type.  My concern is that the path of least 
resistance here will lead to the blind cast; if we make it slightly more 
fussy, people may accept that, but if its too fussy, we'll just get more 
unsafe code.

> I'm also not sure that this construct will be necessary so often.

Well, I've got a 25,000 line prototype in front of me that wants to do 
this _all over the place_.  I want this code to be maintainable when I'm 
done with it.  I may know which casts are safe and which are not, but I 
don't want to rely on that.  So I think it will be an issue.

> First, it's for internal use only, so the surface is quite limited:
> the clients don't need it. Second, even in internal uses, it's likely
> that in most of the cases the public API of interface Foo will be
> enough, so the downcast won't be necessary. Third, it would not be so
> hard to create a private one-liner method for this:
>
> private class MyFooImpl {
>    private static MyFooImpl asImpl(Foo f) {
>      return switch (f) {case MyFooImpl mfi -> mfi;};
>    }
> }

(ObSnark: not so hard, times 25,000.)

More seriously, I don't object to this as a possibility, but we're at 
least two language features away from being able to write this, patterns 
in switch, and total statement switches.

On to Remi's comments:

> If we have a rule that allows semless conversions between the super reference type and primitive object type,
> the pair super reference type/primitive object type  will behave more or less like the auto-boxing rules between an Integer and an int.

That's exactly how I think it's going to work; we generalize 
boxing/unboxing to "value widening/narrowing" conversions, with the 
exact same semantics, and apply to all value/ref projections of 
inlines.  This neatly flows into rules like overload selection, where 
candidates without conversion are preferred to candidates with 
conversion.  No new rules to learn; just more uniform boxing. (Reminder: 
this is the Amber list, so take disagreements with this statement to the 
Valhalla list.)

> One issue I see with this proposal, and the original one by Brian, the relation is asymetric,
> MyFooImpl to MyFoo is subtyping but MyFoo to MyFooImpl is unboxing (so not the same pass when finding the most specific method), so it doesn't work fully like Integer and int.
> And i've used unboxing because it's already an existing rule, but maybe it's a new kind of conversion that will require us to carefully think about

Mucking up the declaration site for this seems unnecessary, and also 
limits the usefulness of it.  (As proposed, this can work when you have 
multiple subtypes that implement a common interface.)  But, the part of 
this that is worth discussing is that yes, this could be a _conversion_ 
rather than assignability.  The effect of this would mostly show up in 
overload selection (where the strongly applicable options are preferred 
over the loosely ones) and type inference. Its a possibility, but I'm 
not sure it really buys much, and doesn't really address Tagir's "too 
magic" concerns.  (If anything, conversions are more magic than 
assignability.)