Value types, encapsulation, and uninitialized values

Wed Feb 20 20:09:43 UTC 2019

Closing the loop on this story....

To summarize what's been said on this thread:

  - Everyone agrees that there are at least some value types that don't 
have a natural (non-null) default, and that making up a default with 
zeros for these types is, at best, surprising.  As Kevin put it:

> Most value types, I think, don't have a zero, and I believe it will be 
> quite damaging to treat them as if they do. If Java doesn't either 
> provide or simulate nullability, then users will be left jumping 
> through a lot of hoops to simulate nullability themselves (`implements 
> IsValid`??). 

  - John made an impassioned plea for not inventing new, 
null-like-but-not-null mechanisms, which can be summarized as "no new 
nulls":

     http://cr.openjdk.java.net/~jrose/values/nullable-values.html

  - The motivation for supporting "nullable" values is not because we 
think values should have null in their value set; this is better handled 
by Optional or type combinators like `Foo?`.  This is really about what 
happens when someone stumbles across a value that has not been 
initialized by a constructor (and the most common case here is array 
elements.)

  - From a user-model perspective, there are a few options. Several 
folks were bullish on letting the user provide an initial value (say, 
via a no-arg constructor), but I think this idea runs off the road, 
since there are some types that _simply have no reasonable (non-null) 
default value_.  These include domain types like

     value record PersonName(String first, String last);

a default name of ("", "") is only slightly less stupid a default than 
(null, null).  These also include inner class instances; if there's no 
enclosing instance available, what are we going to do?

Separately, we have explored a number of ways we might implement this in 
the VM, and I think we have a sensible story.  Some value types are 
_zero intolerant_ -- this means that the all-zero value is not a member 
of their value set.  The key observation is:

     nullability, zero-tolerance, flattenability -- pick two

That is, you can have nullable, zero-tolerant values (think `Point?`), 
but they don't get flattened; or you can have zero-tolerant, flattenable 
values, but they can't be null.  The third combination (thanks 
Frederic!) is that it is possible to have nullable, flattenable values, 
if we make the all-zero representation illegal, and then we use the 
all-zero representation in the heap to represent `null`, and `getfield` 
/ `aaload` will check for zero on fetch and if zero, put a null on the 
stack.  (There's a much bigger writeup on this coming; this is the 
executive summary.)  And because values are monomorphic, different value 
types can make different choices.

Further, a key use case is _migrating_ value-based classes 
(LocalDateTime, Optional) to value types.  The key impediment so far 
here has been nullability; we can represent them as nullable + 
flattenable if we're willing to give up zeros.  Since zeros is a pure 
implementation detail, a class that wants to migrate can always find a 
representation where there is at least one non-zero bit.

So, the sweet spot seems to be:

  - Values, by default, are non-nullable and flattenable.  The compiler 
translates value `Point` as `QPoint;`.

  - Users can denote the union of the value set and { null } using an 
emotional type: `Point?`, which the compiler translates as `LPoint;`.  
If a user wants a nullable `Point`, they ask for it; what they give up 
is flattenability / scalarization.  (I resisted the emotional types as 
long as I could, but the alignment with the VM implementation was too 
strong to resist, and this yields significant dividends when we get to 
the generics story.)  Let's not harp on the details of these types just 
yet; that's a separate shed to paint.

  - For values that need to defend against uninitialized data, or values 
that are migrated from references, they can declare themselves to be 
"null-default"; the cost of these is they must be intolerant of the 
all-zero value.  These are always translated with `L` carriers, since 
they are nullable.  Users of these classes pay the extra penalty of 
checking for zeroes when we go between heap and stack, so they are 
slightly slower, but they still are flattened and scalarized, which is 
the big benefit. (Again, I resisted John's point about nulls, but 
eventually the gravity was too strong; if we don't use null here, we'll 
reinvent a worse null.)

Which correspond to the 3-choose-2 combinations deriving from the 
observation above.

 From a user model perspective, users choose between zero-default values 
(the default) and null-default values (opt in), as the semantics 
demands.  This is easy to understand (in fact, the biggest risk might be 
users will like it _too much_, and they'll reach for null-default value 
classes more often than they should.)  And if you want to represent 
"maybe Point", you use `Point?` or `Optional<Point>` as needed.

 From a VM perspective, we need to support null-default values; while 
we've not implemented this yet, it seems pretty reasonable.

The bonus is that we have cleared the last blocker to migrating 
value-based classes to value types; for migrated values, we implicitly 
make them null-default (also: same treatment for inner value classes), 
and then migrating Optional and LocalDateTime becomes a completely 
compatible, in-place move.

On 10/11/2018 10:14 AM, Brian Goetz wrote:
> Our story is "Codes like a class, works like an int".  A key part of 
> this is that value types support the same lifecycle as objects, and 
> the same ability to hide their internals.
>
> Except, the current story falls down here, because authors must 
> content with the special all-zero default value, because, unlike 
> classes, we cannot guarantee that instances are the result of a 
> constructor.  For some classes (e.g., Complex, Point, etc), this 
> forced-on-you default is just fine, but for others (e.g., wrappers for 
> native resources), this is not unlike the regrettable situation with 
> serialization, where class implementations may confront instances that 
> could not have resulted from a constructor.
>
> Classes guard against this through the magic of null; an instance 
> method will never have to contend with a null receiver, because by the 
> time we transfer control to the method, we'd already have gotten an 
> NPE.  Values do not have this protection.  While there are many things 
> for which we can say "users will learn", I do not think this is one of 
> them; if a class has a constructor, it will be assumed that the 
> receiver in a method invocation will be on an instance that has 
> resulted from construction.  I do not think we can expose the 
> programming model as-is; it claims to be like classes, but in this 
> aspect is more like structs.
>
> So, some values (but not all) will want some sort of protection 
> against uninitialized values.  One approach here would be to try to 
> emulate null, by, say, injecting checks for the default value prior to 
> dereferences.  Another would be to take the route C# did, and allow 
> users to specify a no-arg constructor, which would customize the 
> default value.  (Since both are opt-ins, we can educate users about 
> the costs of selecting these tools, and users can get the benefits of 
> flatness and density even if these have additional runtime costs.)  
> The latter route is less rich, but probably workable.  Both eliminate 
> the (likely perennial) surprise over uninitialized values for 
> zero-sensitive classes.
>
>
>