Value types, encapsulation, and uninitialized values

Sat Oct 27 14:38:14 UTC 2018

> It is true that there exist value types which have a "zero", which I 
> would define as some *natural* choice of default value that will be 
> *blindingly obvious* to any consumer. Of course, Point/Complex are the 
> same two examples that had leapt to my mind. These are the best case, 
> the only value types that will /really/ (almost) join the same 
> category as `int` and `double`.*

I think the truth is somewhere in the middle.  Consider the common use 
cases for values.  Most common would be:

  - Numerics
  - Tuples / product types
  - Mediating wrappers (e.g., Optional)
  - Structured domain entities -- timestamps, distances, etc

Of these categories, the first two fall into the zeroable bucket; the 
latter two are in the "usually not" bucket, but even then sometimes the 
zero may well be a reasonable default (e.g., Optional).  So let's just 
say that both categories are populated, and neither should be ignored as 
negligible.

To be clear, there _are_ going to be costs for types that declare 
themselves nullable.  They'll get the benefit of "flat and dense", but 
will lose out on some other benefits, such as scalarization. Overall I 
think this is a fair tradeoff; users can opt into this treatment, which 
signals a willingness to pay the costs.  (One of the reasons we haven't 
addressed this issue until now, is that we were less confident in our 
ability to estimate the costs.)

> Unfortunately I suspect these are very rare in practice. Most code is 
> business logic

Well, most of your code, anyway.  The scientific programming / ML folks 
might have a different opinion!

> These zero-less types we could divide into two groups: those that at 
> least have some combination of field values that is /invalid/, and 
> those that don't, and are unwilling to add a field now in order to 
> achieve that. The latter have no possible way to be nullable, and the 
> former have a possible way to be nullable that is just plain 
> /weird/ (user provides a sample value that is not allowed to actually 
> exist, so that that can be used as the internal representation of what 
> is surfaced in the language as null.). As weird as that is, at least 
> it would provide actual nullability instead of awkward attempts to 
> simulate what nullability already is, so I think it actually does a 
> better job of confining the weirdness?

Agree that leaning on the existing behavior of nullability is the 
winning move here, rather than creating a new notion of null-like for 
values ('vull', NullValueException, etc.)

The game here is to give the user enough control to say "this is what a 
null is", without punishing all other users for that flexibility.  (Its 
tempting to say "just use the no-arg constructor to dispense an unused 
bit pattern", as C# does, but this has serious costs throughout the rest 
of the system, including on value types that don't take advantage of 
this.)  So we're focusing on how we can optimize _detection_ of 
uninitialized values.  So, some interesting buckets:

  - Types like `int`, where all bit patterns are valid.  These guys have 
a stark choice; add an extra boolean (which could come with a severe 
density price tag), or forego uninitialized-value detection.
  - Types that have naturally unused bits, like LocalDateTime (96 bits) 
or better, FourteenBitShort.  The normal alignment requirements will 
hand us these unused bits.
  - Types that have fields that are never zero/null in an properly 
initialized class.  I suspect this is a common case, and this could be 
used to considerably reduce the cost of the "are you uninitialized" check.

In the first case, even checking all the bits for zero isn't enough; you 
have to inject a new field.  But then you are reduced to the case that 
is shared with the other buckets -- that there's always a 
no-bigger-than-a-word bitfield whose zeroes can be used as a proxy for 
uninitialized.  This is starting to look attractive, as we can bring the 
cost of a null-check down to a single-word mask-and-test.

There's a certain degree of VM gymnastics we would need to do to move 
the costs to where they don't punish everyone else.  The functional 
requirements would be:
  - field access / method invocation when receiver is an uninitialized 
instance of "nullable" values throws an NPE;
  - "boxing" a "null" Foo.val to Foo.box yields true null;
  - "unboxing" a null Foo.box yields a "null" Foo.val (rather than NPE)

I can think of at least three or four ways to get there, with different 
cost models.