Relaxed assignment conversions for sealed types
Brian Goetz
brian.goetz at oracle.com
Sun Oct 25 14:06:42 UTC 2020
You are right to be a little scared. I'm a little scared too.
Many language features are proposed on the basis of "wouldn't it be cool
if we could write this!" This is not one of those cases. The
motivation here is purely: wouldn't people write _safer_ code with this.
The motivation came out of observing how sealed classes are likely to be
used. I was working on a library, which intends to make heavy use of
sealing, where it exposes public interfaces and has a single private
implementation for almost all of them. The implementations are strongly
coupled, they want freely access each other.
What I found was that it was _overwhelmingly_ tempting -- and I'm pretty
disciplined -- to do things like:
void foo(Bar b) {
BarImpl bi = (BarImpl) b;
bi.internalGoop();
}
instead of:
void foo(Bar b) {
if (b instanceof BarImpl bi) {
bi.internalGoop();
}
else
throw new IllegalStateException(blah blah);
}
This is just like the do-nothing default clause, if not worse.
My concern is NOT that I don't want to write the cast in the first
example. It's that that correctness of the cast relies on an assumption
that the compiler can't type check. So we leave users with some bad
choices:
- Write the full-blown if-else version. Which will annoy the users
when they do, but more likely, they just won't do it.
- Write the cast version. Now our code is embedded with a hidden
assumption, which could be later broken, and the compiler can't type
check it for us.
To make it clear that I'm not talking about the annoyance of typing the
cast, let's pretend I'm suggesting to write it like this:
BarImpl bi = (__static BarImpl) b;
which means "if this cast can't be statically proven true, give me a
compile error." I think we can agree that this version is strictly
better than the first version (same runtime semantics, but more static
type checking), and probably better than the second version too. We can
debate the syntax, but this is the point I want to discuss -- that we
can make the language _safer_ by exploiting information the compiler
has, as long as we don't sweep it under the rug.
We want people to write APIs like this -- where the public face is
abstract interfaces, but the private face can be efficient, encapsulated
code. And we also want to provide the maximum type checking we can for
that code.
(Digression: In the future, people might have a slightly better option
than the if-else:
switch (b) {
case BarImpl bi -> { rest of method }
}
which will benefit from the exhaustive type checking. But this will
require patterns in switch (coming) and, for statement switches, some
sort of totality opt in. End digression.)
Let's try and separate the "oooh, scary and different" reactions here
from the problem. I think this is a problem that we should try to
solve, and we _have the information in hand_ to solve it. What remains
is mostly a user-psychology exercise of "what is the right level of
implicit/explicit here."
Some comments inline.
> I'm a little bit scared of introducing this kind of implicit cast into
> the assignment operation. This looks like the right operand in
> MyFooImpl mfi = f; becomes a poly-expression, as the type of the f
> expression now depends on the mfi type. How will it interact with
> other constructs? E.g. can we write MyFooImpl getFooImpl(MyFoo f)
> {return f;}? Or aMethodAcceptingFooImpl(f)?
As I sketched it, yes; this would be an assignability thing, where Foo
is assignable to FooImpl by virtue of sealing. (Again, if sealed types
were true union types, this would already be true by virtue of the
definition of subtyping for a union type.)
> What about something that
> requires more inference, like Function<MyFoo, MyFooImpl> = f -> f; ?
Is this any different from:
Function<Integer, Number> = n -> n;
We rewrite with inference variables:
Function<Integer, Number> = (\alpha n) -> n;
and gather constraints that says we can assign Integer to \alpha and
assign \alpha to Number, and inference says "OK, Integer."
> If we want a compile-time check that 'this cast is safe as there's
> exactly one implementor', I think this should be done in a more
> explicit way. Btw we already designed such a way:
> MyFooImpl mfi = switch(f) {case MyFooImpl _mfi -> _mfi;};
> This would do the thing, though may look somewhat ugly. We can add
> some sugar if anybody doesn't like this syntax. Probably like
> MyFooImpl mfi = (safe-cast-to MyFooImpl)f;
> But my point is that this kind of downcast should be explicit.
As mentioned above, I'm open to that. I think the switch route is a
little too indirect; it either won't occur to users, or it will still be
too many characters to type. My concern is that the path of least
resistance here will lead to the blind cast; if we make it slightly more
fussy, people may accept that, but if its too fussy, we'll just get more
unsafe code.
> I'm also not sure that this construct will be necessary so often.
Well, I've got a 25,000 line prototype in front of me that wants to do
this _all over the place_. I want this code to be maintainable when I'm
done with it. I may know which casts are safe and which are not, but I
don't want to rely on that. So I think it will be an issue.
> First, it's for internal use only, so the surface is quite limited:
> the clients don't need it. Second, even in internal uses, it's likely
> that in most of the cases the public API of interface Foo will be
> enough, so the downcast won't be necessary. Third, it would not be so
> hard to create a private one-liner method for this:
>
> private class MyFooImpl {
> private static MyFooImpl asImpl(Foo f) {
> return switch (f) {case MyFooImpl mfi -> mfi;};
> }
> }
(ObSnark: not so hard, times 25,000.)
More seriously, I don't object to this as a possibility, but we're at
least two language features away from being able to write this, patterns
in switch, and total statement switches.
On to Remi's comments:
> If we have a rule that allows semless conversions between the super reference type and primitive object type,
> the pair super reference type/primitive object type will behave more or less like the auto-boxing rules between an Integer and an int.
That's exactly how I think it's going to work; we generalize
boxing/unboxing to "value widening/narrowing" conversions, with the
exact same semantics, and apply to all value/ref projections of
inlines. This neatly flows into rules like overload selection, where
candidates without conversion are preferred to candidates with
conversion. No new rules to learn; just more uniform boxing. (Reminder:
this is the Amber list, so take disagreements with this statement to the
Valhalla list.)
> One issue I see with this proposal, and the original one by Brian, the relation is asymetric,
> MyFooImpl to MyFoo is subtyping but MyFoo to MyFooImpl is unboxing (so not the same pass when finding the most specific method), so it doesn't work fully like Integer and int.
> And i've used unboxing because it's already an existing rule, but maybe it's a new kind of conversion that will require us to carefully think about
Mucking up the declaration site for this seems unnecessary, and also
limits the usefulness of it. (As proposed, this can work when you have
multiple subtypes that implement a common interface.) But, the part of
this that is worth discussing is that yes, this could be a _conversion_
rather than assignability. The effect of this would mostly show up in
overload selection (where the strongly applicable options are preferred
over the loosely ones) and type inference. Its a possibility, but I'm
not sure it really buys much, and doesn't really address Tagir's "too
magic" concerns. (If anything, conversions are more magic than
assignability.)
More information about the amber-spec-experts
mailing list