Bang, question, ref, and val (was: User model stacking: current status)
Brian Goetz
brian.goetz at oracle.com
Tue Jun 28 19:25:42 UTC 2022
Some further thoughts on the nature of bang, question, ref, and val.
The model outlined in my mail from yesterday accounted for the
distinction between class and type, but left something important out:
carriers. Adding these into the mix, I think this clarifies why `.val`
and `!` are different, and why `!` and `?` are not pure inverses.
The user declares _classes_, which includes identity and value classes.
Ignoring generics for the moment, we derive _types_ from classes.
Identity classes give rise to a single principal type (whose name is the
written the same as the class, but let's call this `C.ref` for clarity);
value classes give rise to two principal types, `C.ref` and `C.val`.
So `val` and `ref` are functions from Class to Type (val is partial):
val :: ValueClass -> Type
ref :: Class -> Type
What's missing is Carrier. Ignoring the legacy primitive carriers (I,
J, F, D), we have two carriers, L and Q. Every type has a carrier. For
the "ref" types, the carrier is L; for the "val" types, the carrier is Q:
carrier ref T = L
carrier val T = Q
Now, bang and question. These are operators on types. Bang restricts
the value set; question (potentially) augments the value set to include
null. Question is best describe as yielding a union type: `T? ===
T|Null`. (Note that for all reference types T, T|Null == T, because
Null <: T.)
What are the carriers for bang and question types? We define the
carrier on union types by taking the stronger of the two carriers:
carrier T|U = max (carrier T) (carrier U)
which means that
carrier question T = L
since we need an L carrier to represent null. But for "bang", we can
preserve the carrier, since we're representing fewer values:
carrier bang T = carrier T
(Why wouldn't we downgrade the carrier of `Point!` to Q? Because the
carrier means more than nullity; it affects atomicity, layout,
initialization strategy, etc.)
What this means is that `question` is always information-losing, and that:
carrier bang question T = L
carrier question bang T = L
So, the ugly fact here is that "bang" and "question" are not inverses;
`T!?` is not always T, nor is `T?!`.
But what I want to know is this: how do we want to denote "T or null",
when T is a type variable? This turns out to be the only place we
currently have to utter `.ref`. And uttering `.ref` here feels like
asking the user to do the language's job; what the user wants is to
describe the union type "T|Null". (Since the only sensible
representation for this is a reference type, the language will translate
it as such anyway, but that's the language's job.)
This is related to how we ask people to describe "nullable int". There
are three choices: `int?`, `int.ref`, and `Integer`. I would argue that
the first is closest to what the user wants: a statement about value
sets. `int.ref` brings in carriers, which is unrelated to what the user
really wants here; `Integer` is even worse because the relationship
between int and Integer is ad-hoc. Of course, they will all translate
the same way (the L carrier), but that's the compiler's job.
For the only remaining use of `.ref` (returning V.ref from Map::get and
friends), I think we want the same; Map::get wants to return "V or
null". Again, ref-ness is a dependent thing, not the essence; the
essence is "T|Null". (Also there's a connection with type patterns,
where we may want to expand a null-rejecting type pattern to a
null-including one.)
The problem, of course, is that once people see `?`, they will think it
is "obvious" that we left out "!" by mistake, because of course they go
together. But they don't, really; they're different things. But let's
set bang aside, and turn to Kevin's next question, which is: if `?` is a
union type with the null type, what does that say about `String?`? This
seems to be on a collision course, in that null-analysis efforts would
want to treat `String?` as "String, with explicit nullness", but the
union interpretation will collapse to just `String`.
Which points the way towards what seems the proper role for bang and
question in the surface syntax, if any: to *modify* types with respect
to their inclusion of null. So `String?` and `int!` should probably be
errors, since String is already nullable and int is already non-nullable.
Bottom line: as we've discovered half a dozen times already in this
project, nearly every time we think that nullity is perfectly correlated
to something, we discover it is not. Bang/question are not val/ref; we
might be able to get away with using `int.ref` to describe nullable
ints, but that doesn't help us at all with nullable or non-nullable type
patterns; and none of these are the same as "known vs unknown nullity"
(or known vs unknown initialization status.)
On 6/27/2022 2:48 PM, Brian Goetz wrote:
> I've been bothered by an uncomfortable feeling that .val and ! are
> somehow different in nature, but haven't been able to put my finger on
> it. Let me make another attempt.
>
> The "bang" and "question" operators operate on types. In the
> strictest form, the bang operator takes a type that has null in its
> value set, and returns a type whose value set is the same, except for
> null. But observe that if the value set contains null, then the type
> has to be a reference type. And the resulting type also has to be a
> reference type (except maybe for weird classes like Void) because
> we're preserving the remaining values, which are references. So we
> could say:
>
> bang :: RefType -> RefType
>
> Bang doesn't change the ref-ness, or id-ness, of a type, it just
> excludes a specific value from the value set.
>
> Now, what do ref and val do? They don't operate on types, they
> operates on _classes_, to produce a type. Val can only be applied to
> value classes, and produces a value type. In the strictest
> interpretation (for consistency with bang), ref also only operates on
> value classes. So:
>
> val :: ValClass -> ValType
> ref :: ValClass -> RefType
>
> Now, we've been strict with bang and ref to say they only work when
> they have a nontrivial effect, and could totalize them in the obvious
> way (ref is a no-op on an id class; bang is a no-op on a value type.)
> Which would give us:
>
> bang :: Type -> Type
> val :: ValClass -> ValType
> ref :: Class -> RefType
>
> with the added invariant that bang preserves id-ness/val-ness/ref-ness
> of types.
>
> But still, bang and ref operate on different things, and and produce
> different things; one takes a type and yields a slightly refined type
> with similar characteristics, the other takes a class and yields a
> type with highly specific characteristics. We can conclude a lot from
> `val` (its a value type, which already says a lot), but we cannot
> conclude anything other than non-nullity from `bang`; it might be a
> ref or a val type, it might come from an identity or value class.
>
> What this says to me is "val is a subtype of bang"; all vals are
> bangs, but not all bangs are vals.
>
> A harder problem is what to do about `question`. The strict
> interpretation says we can only apply `question` to a type that is
> already non-null. In our world, that's ValType.
>
> question :: ValType -> Type
>
> Or we could totalize as we did with bang, and we get an invariant that
> question preserves id-ness, val-ness, ref-ness. But, what does
> `question` really mean? Null is a reference. So there are two
> interpretations: that question always yields a reference type (which
> means non-references need to be lifted/boxed), or that question yields
> a union type.
>
> It turns out that the latter is super-useful on the stack but kind of
> sucks in the heap. The return value of `Map::get`, which we've been
> calling `T.ref`, really wants a union type (T or Null); similarly,
> many difficult questions in pattern matching might be made less
> difficult with a `T or Null` Type. But there is no efficient
> heap-based representation for such a union type; we could use tagged
> unions (blech) or just fall back to boxing. Which leaves us with the
> asymmetry that bang is representation-preserving (as well as other
> things), but question is not. (Which makes sense in that one is
> subtractive and the other is additive.)
>
> So, to your question: is this permanently gross? I think if we adopt
> the strictest intepretations:
>
> - bang is only allowed on types that are already nullable
> - question is only allowed on types that are not nullable (or on type
> variables)
> - val is only allowed on value classes
> - ref is only allowed on value classes (or on type variables)
>
> (And we can possibly boil away the last one, since if we can say `T?`,
> there is no need for `T.ref` anywhere.)
>
> What this means is that you can say `String!`, but not `Optional!`,
> because Optional is already null-free. Which means there is never any
> question whether you say `X.val` or `X!` or `X.val!` (or `X.ref!` if
> we exclude ref entirely). So then, rather than two ways to say the
> same thing, there are two ways to say two different things, which have
> different absolute strengths.
>
> This is somewhat unfortunate, but not "permanently gross."
>
> If we drop `ref` in favor of `?` (not necessarily a slam-dunk), we can
> consider finding another way to spell `.val` which is less intrusive,
> though there are not too many options that don't look like line noise.
>
>
>
>
>
> On 6/15/2022 12:41 PM, Kevin Bourrillion wrote:
>>
>> * I still am saddled with the deep feeling that ultimate victory here
>> looks like "we don't need a val type, because by capturing the
>> nullness bit and tearability info alone we will make /enough/ usage
>> patterns always-optimizable, and we can live with the downsides". To
>> me the upsides of this simplification are enormous, so if we really
>> must reject it, I may need some help understanding why. It's been
>> stated that a non-null value type means something slightly different
>> from a non-null reference type, but I'm not convinced of this; it's
>> just that sometimes you have the technical ability to conjure a
>> "default" instance and sometimes you don't, but nullness of the type
>> means what it means either way.
>>
>> * I think if we plan to go this way (.val), and then we one day
>> have a nullable types feature, some things will then be
>> permanently gross that I would hope we can avoid. For example,
>> nullness *also* demands the concept of bidirectional projection
>> of type variables, and for very overlapping reasons. This puts
>> things in a super weird place.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220628/b88448c2/attachment-0001.htm>
More information about the valhalla-spec-observers
mailing list