Nullity (was: User model stacking: current status)

Wed May 11 12:47:52 UTC 2022

Hi,

Would nullable types be `T?`? This is what I've inferred, but would
appreciate it being made explicit. I will continue with this assumption in
the
rest of my answer.

I personally very much enjoy Kotlin's and Rust's forced nullity. I believe a
clear majority of the other users of these languages do the same. Because of
this, I would absolutely encourage you/another team to go down the path of
considering designing and implementing nullness as part of the type-system.
Regarding changing the types to be `.ref` by default: I think this would be
a
beneficial change with regards to the current behaviour of types in Java
(i.e.
`.ref` being the only option, and have primitives be exceptions to this
rule).

This could potentially lead to some smaller mess-ups in the future, however
as I
would _imagine_ most developers would like the benefits of `.val` types in
most
instances, but they may forget. To put the `.val` case into a perspective
of how
easy it could become to forget: if you have a game where you need
positions, you
could have e.g. `Pos3d` (let's model this as `primitive record Pos3d(double
x,
double y, double z)` for completeness' sake). This is a light type that
would be
defaulted to being stored on the heap. Larger games will require you to
write
`.val` everywhere, which may easily be forgotten in hot code.

Given the possibility of `Optional.val`, we could potentially be missing a
good
practice here. It would(/could) be much cheaper to allocate an
`Optional.val<Pos3d.val!>!` than it would be to allocate a `Pos3d.ref?`.
Please
also note that I have avoided the topic of binary- and source-compatibility
in
this entirely; they may very much be important aspects to consider, given
the
defaults would change existing code, even in the JDK.

I'm not sure of how much help I am to your gauging interest, but hope it
could,
at the very least, be a small indication of how users of other languages may
find the ideas brought up.

Kind regards,
Mariell Hoversholm (she/her)

On Mon, 9 May 2022 at 23:14, Brian Goetz <brian.goetz at oracle.com> wrote:

> Assuming the stacking here is satisfactory, let's talk about .ref and .val.
>
> Kevin made a strong argument for .ref as default, so let's pull on that
> string for a bit.
>
> Universal generics need a way to express .ref at least for type
> variables, so if we're going to make .ref the default, we still need a
> way to denote it.  Calling the types Foo.ref and Foo.val, where Foo is
> an alias for Foo.ref, is one way to achieve this.
>
> <wild-speculation>
>
> Now, let's step out onto some controversial territory: how do we spell
> .ref and .val?  Specifically, how good a fit is `!` and `?` (henceforth,
> emotional types), now that the B3 election is _solely_ about the
> existence of a zero-default .val?  (Before, it was a poor fit, but now
> it might be credible.  Yet another reason I wanted to tease apart what
> "primitive" meant into independent axes.)
>
> Pro: users think they really want emotional types.
> Pro: to the extent we eventually acquire full emotional types, and to
> the extent these align cleanly with primitive type projections, it
> avoids weirdnesses like `Foo.val?`, where there are two ways to talk
> about nullity.
>
> Con: These will surely not initially be the full emotional types users
> think they want, and so may well be met by "you idiots, these are not
> the emotional types we want"
> Con: To the extent full emotional types do not align clearly with
> primitive type projections, we might be painted into a corner and it
> might be harder to do emotional types.
>
> Risk: the language treatment of emotional types is one thing, but the
> real cost in introducing them into the language is annotating the
> libraries.  Having them in the language but not annotating the libraries
> on a timely basis may well be a step backwards.
>
>
> If we had full emotional types, some would have their non-nullity erased
> (`String!` erases to the same type descriptor as ordinary `String`) and
> some would have it reified (Integer! translates to a separate type, the
> `I` carrier.)  This means that migrating `String` to `String` might be
> binary-compatible, but `Integer` to `Integer!` would not be.  (This is
> probably an acceptable asymmetry.)
>
> But a bigger question is whether an erased `String!` should be backed up
> by a synthetic null check at the boundary between checked and unchecked
> code, such as method entry points (just as unpacking a T from a generic
> is backed up by a synthetic cast at the boundary between generic and
> explicit code.)  This is reasonable (and cheap enough), but may be on a
> collision course with some interpretations of `String!`.
>
> Initially, we probably would restrict the use of `!` to val-projections
> of primitive classes, but the pressure to extend it would always be just
> around the corner (e.g., having them in type patterns would likely
> address many people's initial discomfort about null handling in patterns).
>
> </wild-speculation>
>
> My goal here is not to dive into the details of "let's design nullable
> types", as that would be a distraction at this point, as much as to
> gauge sentiment on whether this is worth exploring further, and gather
> considerations I may have missed in this brief summary.
>
>
> On 5/8/2022 12:32 PM, Brian Goetz wrote:
> > To track the progress of the spiral:
> >
> >  - We originally came up with the B2/B3 division to carve off B2 as
> > the "safe subset", where you get less flattening but nulls and more
> > integrity.  This provided a safe migration target for existing VBCs,
> > as well as a reasonable target for creating new VBCs that want to be
> > mostly class-like but enjoy some additional optimization (and shed
> > accidental identity for safety reasons.)
> >
> >  - When we put all the flesh on the bones of B2/B3, there were some
> > undesirable consequences, such as (a) tearing was too subtle, and (b)
> > both the semantics and cost model differences between B2/B3 were going
> > to be hard to explain (and in some cases, users have bad choices
> > between semantics and performance.)
> >
> >  - A few weeks ago, we decided to more seriously consider separating
> > atomicity out as an explicit thing on its own. This had the benefit of
> > putting semantics first, and offered a clearer cost model: you could
> > give up identity but keep null-default and integrity (B2), further
> > give up nulls to get some more density (B3.val), and further further
> > give up atomicity to get more flatness (non-atomic B3.)  This was
> > honest, but led people to complain "great, now there are four buckets."
> >
> >  - We explored making non-atomicity a cross-cutting concern, so there
> > are two new buckets (VBC and primitive-like), either of which can
> > choose their atomicity constraints, and then within the primitive-like
> > bucket, the .val and .ref projections differ only with respect to the
> > consequences of nullity.  This felt cleaner (more orthogonal), but the
> > notion of a non-atomic B2 itself is kind of weird.
> >
> > So where this brings us is back to something that might feel like the
> > four-bucket approach in the third bullet above, but with two big
> > differences: atomicity is an explicit property of a class, rather than
> > a property of reference-ness, and a B3.ref is not necessarily the same
> > as a B2.  This recognizes that the main distinction between B2 or B3
> > is *whether a class can tolerate its zero value.*
> >
> > More explicitly:
> >
> >  - B1 remains unchanged
> >
> >  - B2 is for "ordinary" value-based classes.  Always atomic, always
> > nullable, always reference; the only difference with B1 is that it has
> > shed its identity, enabling routine stack-based flattening, and
> > perhaps some heap flattening depending on VM sophistication and
> > heroics.  B2 is a good target for migrating many existing value-based
> > classes.
> >
> >  - B3 means that a class can tolerate its zero (uninitialized) value,
> > and therefore gives rise to two types, which we'll call B3.ref and
> > B3.val.  The former is a reference type and is therefore nullable and
> > null-default; the latter is a direct/immediate/value type whose
> > default is zero.
> >
> >  - B3 classes can further be marked non-atomic; this unlocks greater
> > flattening in the heap at the cost of tearing under race, and is
> > suitable for classes without cross-field invariants.  Non-atomicity
> > accrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears
> > (and therefore might expose its zero under race, as per friday's
> > discussions.)
> >
> > Syntactically (reminder: NOT an invitation to discuss syntax at this
> > point), this might look like:
> >
> >     class B1 { }                // identity, reference, atomic
> >
> >     value-based class B2 { }    // non-identity, reference, atomic
> >
> >     value class B3 { }          // non-identity, .ref and .val, both
> > atomic
> >
> >     non-atomic value class B3 { }  // similar to B3, but both are
> > non-atomic
> >
> > So, two new (but related) class modifiers, of which one has an
> > additional modifier.  (The spelling of all of these can be discussed
> > after the user model is entirely nailed down.)
> >
> > So, there's a monotonic sequence of "give stuff up, get other stuff":
> >
> >  - B2 gives up identity relative to B1, gains some flattening
> >  - B3 optionally gives up null-defaultness relative to B2, yielding
> > two types, one of which sheds some footprint
> >  - non-atomic B3 gives up atomicity relative to B3, gaining more
> > flatness, for both type projections
> >
> >
> >
> >
> >
> >
> > On 5/6/2022 10:04 AM, Brian Goetz wrote:
> >> Thinking more about Dan's concerns here ...
> >>
> >> On 5/5/2022 6:00 PM, Dan Smith wrote:
> >>> This is significant because the primary reason to declare a B2
> >>> rather than a B3 is to guarantee that the all-zeros value cannot be
> >>> created.
> >>
> >> This is a little bit of a circular argument; it takes a property that
> >> an atomic B2 has, but a non-atomic B2 lacks, and declares that to be
> >> "the whole point" of B2.  It may be that exposure of the zero is so
> >> bad we may eventually want to back away from the idea, but let's come
> >> up with a fair picture of what a non-atomic B2 means, and ask if
> >> that's sufficiently useful.
> >>
> >>> This leads me to conclude that if you're declaring a non-atomic B2,
> >>> you might as well just declare a non-atomic B3.
> >>
> >> Fair point, but let's pull on this string for a moment.  Suppose I
> >> want a null-default, flattenable value, and I'm willing to take the
> >> tearing to get there.  So you're saying "then declare a B3 and use
> >> B3.ref".  But B3.ref was supposed to have the same semantics as an
> >> equivalent B2!  (I realize I'm doing the same thing I just accused
> >> you of above -- taking an old invariant and positiioning it as "the
> >> point".  Stay tuned.)  Which means either that we lose flattening,
> >> again, or we create yet another asymmetry between B3.ref and B2.
> >> Maybe you're saying that the combination of nullable and full-flat is
> >> just too much to ask, but I am not sure it is; in any case, let's
> >> convince ourselves of this before we rule it out.
> >>
> >> Or maybe, what you're saying is that my claim that B3.ref and B2 are
> >> the same thing is the stale thing here, and we can let it go and get
> >> it back in another form.  In which case you're positing a model where:
> >>
> >>  - B1 is unchanged
> >>  - B2 is always atomic, reference, nullable
> >>  - B3 really means "the zero is OK", comes with .ref and .val, and
> >> (non-atomic B3).ref is still tearable?
> >>
> >> In this model, (non-atomic B3).ref takes the place of (non-atomic B2)
> >> in the stacking I've been discussing.  Is that what you're saying?
> >>
> >>     class B1 { }  // ref, identity, atomic
> >>     value-based class B2 { }  // ref, non-identity, atomic
> >>     [ non-atomic ] value class B3 { }  // ref or val, zero is ok,
> >> both projections share atomicity
> >>
> >> If we go with ref-default, then this is a small leap from yesterday's
> >> stacking, because "B3" and "B2" are both reference types, so if you
> >> want a tearable, non-atomic reference type, saying `non-atomic value
> >> class B3` and then just using B3 gets you that. Then:
> >>
> >>  - B2 is like B1, minus identity
> >>  - B3 means "uninitialized values are OK, you get two types, a
> >> zero-default and a non-default"
> >>  - Non-atomicity is an extra property we can add to B3, to get more
> >> flattening in exchange for less integrity
> >>  - The use cases for non-atomic B2 are served by non-atomic B3 (when
> >> .ref is the default)
> >>
> >> I think this still has the properties I want; I can freely choose the
> >> reasonable subsets of { identity, has-zero, nullable, atomicity }
> >> that I want; the orthogonality of non-atomic across buckets becomes
> >> orthogonality of non-atomic with nullity, and the "B3.ref is just
> >> like B2" is shown to be the "false friend."
> >>
> >>
> >
>

-- 

*Mariell Hoversholm *(she/her)

Software Developer

Integrations (Slack #integration-team-public)

Paf

Mobile: +46 73 329 40 18

Bråddgatan 11 SE602 22, Norrköping

Sweden

*Working remote from Uppsala*

This email is confidential and may contain legally privileged information.
If you are not the intended recipient, please contact the sender and delete
the email from your system without producing, distributing or retaining
copies thereof. Thank you.