Value types, encapsulation, and uninitialized values
Brian Goetz
brian.goetz at oracle.com
Sat Oct 27 14:38:14 UTC 2018
> It is true that there exist value types which have a "zero", which I
> would define as some *natural* choice of default value that will be
> *blindingly obvious* to any consumer. Of course, Point/Complex are the
> same two examples that had leapt to my mind. These are the best case,
> the only value types that will /really/ (almost) join the same
> category as `int` and `double`.*
I think the truth is somewhere in the middle. Consider the common use
cases for values. Most common would be:
- Numerics
- Tuples / product types
- Mediating wrappers (e.g., Optional)
- Structured domain entities -- timestamps, distances, etc
Of these categories, the first two fall into the zeroable bucket; the
latter two are in the "usually not" bucket, but even then sometimes the
zero may well be a reasonable default (e.g., Optional). So let's just
say that both categories are populated, and neither should be ignored as
negligible.
To be clear, there _are_ going to be costs for types that declare
themselves nullable. They'll get the benefit of "flat and dense", but
will lose out on some other benefits, such as scalarization. Overall I
think this is a fair tradeoff; users can opt into this treatment, which
signals a willingness to pay the costs. (One of the reasons we haven't
addressed this issue until now, is that we were less confident in our
ability to estimate the costs.)
> Unfortunately I suspect these are very rare in practice. Most code is
> business logic
Well, most of your code, anyway. The scientific programming / ML folks
might have a different opinion!
> These zero-less types we could divide into two groups: those that at
> least have some combination of field values that is /invalid/, and
> those that don't, and are unwilling to add a field now in order to
> achieve that. The latter have no possible way to be nullable, and the
> former have a possible way to be nullable that is just plain
> /weird/ (user provides a sample value that is not allowed to actually
> exist, so that that can be used as the internal representation of what
> is surfaced in the language as null.). As weird as that is, at least
> it would provide actual nullability instead of awkward attempts to
> simulate what nullability already is, so I think it actually does a
> better job of confining the weirdness?
Agree that leaning on the existing behavior of nullability is the
winning move here, rather than creating a new notion of null-like for
values ('vull', NullValueException, etc.)
The game here is to give the user enough control to say "this is what a
null is", without punishing all other users for that flexibility. (Its
tempting to say "just use the no-arg constructor to dispense an unused
bit pattern", as C# does, but this has serious costs throughout the rest
of the system, including on value types that don't take advantage of
this.) So we're focusing on how we can optimize _detection_ of
uninitialized values. So, some interesting buckets:
- Types like `int`, where all bit patterns are valid. These guys have
a stark choice; add an extra boolean (which could come with a severe
density price tag), or forego uninitialized-value detection.
- Types that have naturally unused bits, like LocalDateTime (96 bits)
or better, FourteenBitShort. The normal alignment requirements will
hand us these unused bits.
- Types that have fields that are never zero/null in an properly
initialized class. I suspect this is a common case, and this could be
used to considerably reduce the cost of the "are you uninitialized" check.
In the first case, even checking all the bits for zero isn't enough; you
have to inject a new field. But then you are reduced to the case that
is shared with the other buckets -- that there's always a
no-bigger-than-a-word bitfield whose zeroes can be used as a proxy for
uninitialized. This is starting to look attractive, as we can bring the
cost of a null-check down to a single-word mask-and-test.
There's a certain degree of VM gymnastics we would need to do to move
the costs to where they don't punish everyone else. The functional
requirements would be:
- field access / method invocation when receiver is an uninitialized
instance of "nullable" values throws an NPE;
- "boxing" a "null" Foo.val to Foo.box yields true null;
- "unboxing" a null Foo.box yields a "null" Foo.val (rather than NPE)
I can think of at least three or four ways to get there, with different
cost models.
More information about the valhalla-spec-observers
mailing list