Value types, encapsulation, and uninitialized values
Brian Goetz
brian.goetz at oracle.com
Wed Feb 20 20:09:43 UTC 2019
Closing the loop on this story....
To summarize what's been said on this thread:
- Everyone agrees that there are at least some value types that don't
have a natural (non-null) default, and that making up a default with
zeros for these types is, at best, surprising. As Kevin put it:
> Most value types, I think, don't have a zero, and I believe it will be
> quite damaging to treat them as if they do. If Java doesn't either
> provide or simulate nullability, then users will be left jumping
> through a lot of hoops to simulate nullability themselves (`implements
> IsValid`??).
- John made an impassioned plea for not inventing new,
null-like-but-not-null mechanisms, which can be summarized as "no new
nulls":
http://cr.openjdk.java.net/~jrose/values/nullable-values.html
- The motivation for supporting "nullable" values is not because we
think values should have null in their value set; this is better handled
by Optional or type combinators like `Foo?`. This is really about what
happens when someone stumbles across a value that has not been
initialized by a constructor (and the most common case here is array
elements.)
- From a user-model perspective, there are a few options. Several
folks were bullish on letting the user provide an initial value (say,
via a no-arg constructor), but I think this idea runs off the road,
since there are some types that _simply have no reasonable (non-null)
default value_. These include domain types like
value record PersonName(String first, String last);
a default name of ("", "") is only slightly less stupid a default than
(null, null). These also include inner class instances; if there's no
enclosing instance available, what are we going to do?
Separately, we have explored a number of ways we might implement this in
the VM, and I think we have a sensible story. Some value types are
_zero intolerant_ -- this means that the all-zero value is not a member
of their value set. The key observation is:
nullability, zero-tolerance, flattenability -- pick two
That is, you can have nullable, zero-tolerant values (think `Point?`),
but they don't get flattened; or you can have zero-tolerant, flattenable
values, but they can't be null. The third combination (thanks
Frederic!) is that it is possible to have nullable, flattenable values,
if we make the all-zero representation illegal, and then we use the
all-zero representation in the heap to represent `null`, and `getfield`
/ `aaload` will check for zero on fetch and if zero, put a null on the
stack. (There's a much bigger writeup on this coming; this is the
executive summary.) And because values are monomorphic, different value
types can make different choices.
Further, a key use case is _migrating_ value-based classes
(LocalDateTime, Optional) to value types. The key impediment so far
here has been nullability; we can represent them as nullable +
flattenable if we're willing to give up zeros. Since zeros is a pure
implementation detail, a class that wants to migrate can always find a
representation where there is at least one non-zero bit.
So, the sweet spot seems to be:
- Values, by default, are non-nullable and flattenable. The compiler
translates value `Point` as `QPoint;`.
- Users can denote the union of the value set and { null } using an
emotional type: `Point?`, which the compiler translates as `LPoint;`.
If a user wants a nullable `Point`, they ask for it; what they give up
is flattenability / scalarization. (I resisted the emotional types as
long as I could, but the alignment with the VM implementation was too
strong to resist, and this yields significant dividends when we get to
the generics story.) Let's not harp on the details of these types just
yet; that's a separate shed to paint.
- For values that need to defend against uninitialized data, or values
that are migrated from references, they can declare themselves to be
"null-default"; the cost of these is they must be intolerant of the
all-zero value. These are always translated with `L` carriers, since
they are nullable. Users of these classes pay the extra penalty of
checking for zeroes when we go between heap and stack, so they are
slightly slower, but they still are flattened and scalarized, which is
the big benefit. (Again, I resisted John's point about nulls, but
eventually the gravity was too strong; if we don't use null here, we'll
reinvent a worse null.)
Which correspond to the 3-choose-2 combinations deriving from the
observation above.
From a user model perspective, users choose between zero-default values
(the default) and null-default values (opt in), as the semantics
demands. This is easy to understand (in fact, the biggest risk might be
users will like it _too much_, and they'll reach for null-default value
classes more often than they should.) And if you want to represent
"maybe Point", you use `Point?` or `Optional<Point>` as needed.
From a VM perspective, we need to support null-default values; while
we've not implemented this yet, it seems pretty reasonable.
The bonus is that we have cleared the last blocker to migrating
value-based classes to value types; for migrated values, we implicitly
make them null-default (also: same treatment for inner value classes),
and then migrating Optional and LocalDateTime becomes a completely
compatible, in-place move.
On 10/11/2018 10:14 AM, Brian Goetz wrote:
> Our story is "Codes like a class, works like an int". A key part of
> this is that value types support the same lifecycle as objects, and
> the same ability to hide their internals.
>
> Except, the current story falls down here, because authors must
> content with the special all-zero default value, because, unlike
> classes, we cannot guarantee that instances are the result of a
> constructor. For some classes (e.g., Complex, Point, etc), this
> forced-on-you default is just fine, but for others (e.g., wrappers for
> native resources), this is not unlike the regrettable situation with
> serialization, where class implementations may confront instances that
> could not have resulted from a constructor.
>
> Classes guard against this through the magic of null; an instance
> method will never have to contend with a null receiver, because by the
> time we transfer control to the method, we'd already have gotten an
> NPE. Values do not have this protection. While there are many things
> for which we can say "users will learn", I do not think this is one of
> them; if a class has a constructor, it will be assumed that the
> receiver in a method invocation will be on an instance that has
> resulted from construction. I do not think we can expose the
> programming model as-is; it claims to be like classes, but in this
> aspect is more like structs.
>
> So, some values (but not all) will want some sort of protection
> against uninitialized values. One approach here would be to try to
> emulate null, by, say, injecting checks for the default value prior to
> dereferences. Another would be to take the route C# did, and allow
> users to specify a no-arg constructor, which would customize the
> default value. (Since both are opt-ins, we can educate users about
> the costs of selecting these tools, and users can get the benefits of
> flatness and density even if these have additional runtime costs.)
> The latter route is less rich, but probably workable. Both eliminate
> the (likely perennial) surprise over uninitialized values for
> zero-sensitive classes.
>
>
>
More information about the valhalla-spec-experts
mailing list