Bang, question, ref, and val (was: User model stacking: current status)

Brian Goetz brian.goetz at oracle.com
Tue Jun 28 19:25:42 UTC 2022


Some further thoughts on the nature of bang, question, ref, and val.

The model outlined in my mail from yesterday accounted for the 
distinction between class and type, but left something important out: 
carriers.  Adding these into the mix, I think this clarifies why `.val` 
and `!` are different, and why `!` and `?` are not pure inverses.

The user declares _classes_, which includes identity and value classes.  
Ignoring generics for the moment, we derive _types_ from classes.  
Identity classes give rise to a single principal type (whose name is the 
written the same as the class, but let's call this `C.ref` for clarity); 
value classes give rise to two principal types, `C.ref` and `C.val`.

So `val` and `ref` are functions from Class to Type (val is partial):

     val :: ValueClass -> Type
     ref :: Class -> Type

What's missing is Carrier.  Ignoring the legacy primitive carriers (I, 
J, F, D), we have two carriers, L and Q.  Every type has a carrier.  For 
the "ref" types, the carrier is L; for the "val" types, the carrier is Q:

     carrier ref T = L
     carrier val T = Q

Now, bang and question.  These are operators on types.  Bang restricts 
the value set; question (potentially) augments the value set to include 
null.  Question is best describe as yielding a union type: `T? === 
T|Null`.  (Note that for all reference types T, T|Null == T, because 
Null <: T.)

What are the carriers for bang and question types?  We define the 
carrier on union types by taking the stronger of the two carriers:

     carrier T|U = max (carrier T) (carrier U)

which means that

     carrier question T = L

since we need an L carrier to represent null.  But for "bang", we can 
preserve the carrier, since we're representing fewer values:

     carrier bang T = carrier T

(Why wouldn't we downgrade the carrier of `Point!` to Q? Because the 
carrier means more than nullity; it affects atomicity, layout, 
initialization strategy, etc.)

What this means is that `question` is always information-losing, and that:

     carrier bang question T = L
     carrier question bang T = L

So, the ugly fact here is that "bang" and "question" are not inverses; 
`T!?` is not always T, nor is `T?!`.

But what I want to know is this: how do we want to denote "T or null", 
when T is a type variable?  This turns out to be the only place we 
currently have to utter `.ref`.  And uttering `.ref` here feels like 
asking the user to do the language's job; what the user wants is to 
describe the union type "T|Null".  (Since the only sensible 
representation for this is a reference type, the language will translate 
it as such anyway, but that's the language's job.)

This is related to how we ask people to describe "nullable int".  There 
are three choices: `int?`, `int.ref`, and `Integer`.  I would argue that 
the first is closest to what the user wants: a statement about value 
sets.  `int.ref` brings in carriers, which is unrelated to what the user 
really wants here; `Integer` is even worse because the relationship 
between int and Integer is ad-hoc.  Of course, they will all translate 
the same way (the L carrier), but that's the compiler's job.

For the only remaining use of `.ref` (returning V.ref from Map::get and 
friends), I think we want the same; Map::get wants to return "V or 
null".  Again, ref-ness is a dependent thing, not the essence; the 
essence is "T|Null".  (Also there's a connection with type patterns, 
where we may want to expand a null-rejecting type pattern to a 
null-including one.)

The problem, of course, is that once people see `?`, they will think it 
is "obvious" that we left out "!" by mistake, because of course they go 
together.  But they don't, really; they're different things.  But let's 
set bang aside, and turn to Kevin's next question, which is: if `?` is a 
union type with the null type, what does that say about `String?`?  This 
seems to be on a collision course, in that null-analysis efforts would 
want to treat `String?` as "String, with explicit nullness", but the 
union interpretation will collapse to just `String`.

Which points the way towards what seems the proper role for bang and 
question in the surface syntax, if any: to *modify* types with respect 
to their inclusion of null.  So `String?` and `int!` should probably be 
errors, since String is already nullable and int is already non-nullable.

Bottom line: as we've discovered half a dozen times already in this 
project, nearly every time we think that nullity is perfectly correlated 
to something, we discover it is not. Bang/question are not val/ref; we 
might be able to get away with using `int.ref` to describe nullable 
ints, but that doesn't help us at all with nullable or non-nullable type 
patterns; and none of these are the same as "known vs unknown nullity" 
(or known vs unknown initialization status.)





On 6/27/2022 2:48 PM, Brian Goetz wrote:
> I've been bothered by an uncomfortable feeling that .val and ! are 
> somehow different in nature, but haven't been able to put my finger on 
> it.  Let me make another attempt.
>
> The "bang" and "question" operators operate on types.  In the 
> strictest form, the bang operator takes a type that has null in its 
> value set, and returns a type whose value set is the same, except for 
> null.   But observe that if the value set contains null, then the type 
> has to be a reference type.  And the resulting type also has to be a 
> reference type (except maybe for weird classes like Void) because 
> we're preserving the remaining values, which are references.  So we 
> could say:
>
>     bang :: RefType -> RefType
>
> Bang doesn't change the ref-ness, or id-ness, of a type, it just 
> excludes a specific value from the value set.
>
> Now, what do ref and val do?  They don't operate on types, they 
> operates on _classes_, to produce a type.  Val can only be applied to 
> value classes, and produces a value type.  In the strictest 
> interpretation (for consistency with bang), ref also only operates on 
> value classes.  So:
>
>     val :: ValClass -> ValType
>     ref :: ValClass -> RefType
>
> Now, we've been strict with bang and ref to say they only work when 
> they have a nontrivial effect, and could totalize them in the obvious 
> way (ref is a no-op on an id class; bang is a no-op on a value type.)  
> Which would give us:
>
>     bang :: Type -> Type
>     val :: ValClass -> ValType
>     ref :: Class -> RefType
>
> with the added invariant that bang preserves id-ness/val-ness/ref-ness 
> of types.
>
> But still, bang and ref operate on different things, and and produce 
> different things; one takes a type and yields a slightly refined type 
> with similar characteristics, the other takes a class and yields a 
> type with highly specific characteristics.  We can conclude a lot from 
> `val` (its a value type, which already says a lot), but we cannot 
> conclude anything other than  non-nullity from `bang`; it might be a 
> ref or a val type, it might come from an identity or value class.
>
> What this says to me is "val is a subtype of bang"; all vals are 
> bangs, but not all bangs are vals.
>
> A harder problem is what to do about `question`.  The strict 
> interpretation says we can only apply `question` to a type that is 
> already non-null.  In our world, that's ValType.
>
>     question :: ValType -> Type
>
> Or we could totalize as we did with bang, and we get an invariant that 
> question preserves id-ness, val-ness, ref-ness.  But, what does 
> `question` really mean?  Null is a reference.  So there are two 
> interpretations: that question always yields a reference type (which 
> means non-references need to be lifted/boxed), or that question yields 
> a union type.
>
> It turns out that the latter is super-useful on the stack but kind of 
> sucks in the heap.  The return value of `Map::get`, which we've been 
> calling `T.ref`, really wants a union type (T or Null); similarly, 
> many difficult questions in pattern matching might be made less 
> difficult with a `T or Null` Type.  But there is no efficient 
> heap-based representation for such a union type; we could use tagged 
> unions (blech) or just fall back to boxing. Which leaves us with the 
> asymmetry that bang is representation-preserving (as well as other 
> things), but question is not.  (Which makes sense in that one is 
> subtractive and the other is additive.)
>
> So, to your question: is this permanently gross?  I think if we adopt 
> the strictest intepretations:
>
>  - bang is only allowed on types that are already nullable
>  - question is only allowed on types that are not nullable (or on type 
> variables)
>  - val is only allowed on value classes
>  - ref is only allowed on value classes (or on type variables)
>
> (And we can possibly boil away the last one, since if we can say `T?`, 
> there is no need for `T.ref` anywhere.)
>
> What this means is that you can say `String!`, but not `Optional!`, 
> because Optional is already null-free.  Which means there is never any 
> question whether you say `X.val` or `X!` or `X.val!` (or `X.ref!` if 
> we exclude ref entirely).  So then, rather than two ways to say the 
> same thing, there are two ways to say two different things, which have 
> different absolute strengths.
>
> This is somewhat unfortunate, but not "permanently gross."
>
> If we drop `ref` in favor of `?` (not necessarily a slam-dunk), we can 
> consider finding another way to spell `.val` which is less intrusive, 
> though there are not too many options that don't look like line noise.
>
>
>
>
>
> On 6/15/2022 12:41 PM, Kevin Bourrillion wrote:
>>
>> * I still am saddled with the deep feeling that ultimate victory here 
>> looks like "we don't need a val type, because by capturing the 
>> nullness bit and tearability info alone we will make /enough/ usage 
>> patterns always-optimizable, and we can live with the downsides". To 
>> me the upsides of this simplification are enormous, so if we really 
>> must reject it, I may need some help understanding why. It's been 
>> stated that a non-null value type means something slightly different 
>> from a non-null reference type, but I'm not convinced of this; it's 
>> just that sometimes you have the technical ability to conjure a 
>> "default" instance and sometimes you don't, but nullness of the type 
>> means what it means either way.
>>
>>     * I think if we plan to go this way (.val), and then we one day
>>     have a nullable types feature, some things will then be
>>     permanently gross that I would hope we can avoid. For example,
>>     nullness *also* demands the concept of bidirectional projection
>>     of type variables, and for very overlapping reasons. This puts
>>     things in a super weird place.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220628/b88448c2/attachment-0001.htm>


More information about the valhalla-spec-observers mailing list