value type hygiene

Thu May 10 20:36:08 UTC 2018

On May 10, 2018, at 11:53 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> Objection, Your Honor, assumes facts not in evidence!  The letters "n-u-l-l" do not appear in the definition of value-based linked above.  Users have no reasons to believe any of he following are bad for a VBC, among others:

Objection sustained.  You are right; null-hostility is not documented in VBCs.
It *is* documented for the VBC Optional:
> A variable whose type is {@code Optional} should
> never itself be {@code null}; it should always point to an {@code Optional}
> instance.
…but not for the VBC LocalDate.  So some VBCs that are null-friendly will
require a more nuanced migration story, since we don't want value types
per se to be nullable.

Options:  Make *some* value types accept null according to some ad
hoc opt-in API, yuck.  Make a nullable type constructor available, like int?,
Optional<T>? or Optional<T>.BOX.  Or limit nullability to particular
uses of the same type.  Or create ValueRef<T> as a workaround
interface (no VM or language changes!).  The last two are my prefs.

The nullable type constructor is fairly principled and arguably useful,
but *not necessary* for value types per se, unless it is the cheapest
way to migrate.  Which IMO it isn't.  (N.B. Assuming migration is a goal.)

My high-order concern in all of this is to reduce degrees of freedom
in classfiles, from "every descriptor makes its own choice about nullity"
down to "each classfile makes its decision about each type's valueness"
backed up by "and if they don't agree, the JVM expects the user to fix
it by recompilation or special workarounds".  This is why I'm not jumping
at the shiny possibility of int? and String! "just by adding info to the
descriptors"; the JVM complexity costs are large and optional for
value types.

> In Q world, these uses of V were compiled to LV rather than QV, so these idioms mapped to a natural and sensible translation.  Our story was that when you went to recompile, then the stricter requirements of value-ness would be enforced, and you'd have to fix your code.  (That is: making V a value is binary compatible but not necessarily source compatible.  This was a pretty valuable outcome.)
> 
> One of the possible coping strategies in Q world for such code is to allow the box type LV to be denoted at source level in a possibly-ugly way, so that code like the above could continue to work by saying "V.BOX" instead of "V".  IOW, you can opt into the old behavior, and even mix and match between int and Integer.  So users who wanted to fight progress had some escape hatches.
> 

> While I don't have a specific answer here, I do think we have to back up and reconsider the assumption that all uses of V were well informed about V's null-hostility, and have a more nuanced notion of the boundaries between V-as-value-world and V-as-ref-world.

That's fair.  Following is a specific answer to consider, plus a
completely different one in a P.S.

I'd like to spell V.BOX as ValueRef<T>, at least just for argument.

More on ValueRef:

@ForValueTypesOnly
interface ValueRef<T extends ValueRef<T>> {
  @CanBeFreebieDownCast
  @SuppressWarnings("unchecked")
  default T byValue() { return (T) this; }
}

Ignore the annotations for a moment.  Here's an example use
(after LocalDate migrates, and is given ValueRef as a super):

   LocalDate ld = LocalDate.now();
   ValueRef<LocalDate> ldOrNull = ld;
   if (p)  idOrNull = null;
   ld = (LocalDate) ldOrNull;  // downcast with null check
   ld = ldOrNull.byValue();  // same thing

ValueRef<T> and T are bijective apart from null, with the usual
downcast and upcast.  Differences between the companion types are:

  - T is not nullable (if a VT), while ValueRef<T> is (being an interface)
  - converting ValueRef<T> to T requires an explicit cast, the other way is implicit
 - you can't call T's methods on ValueRef<T>

The language builds in an upcast for free, but the downcast has
to be explicit.  If we were to put in the upcast as a "freebie" supplied
by the language, then we'd have ourselves a wrapper type, like
Integer for int:

   id = idOrNull;  //freebie downcast, with null check
   idOrNull = id;  //normal upcast

This shows a possible way to associate a box type with each value
type, with only incremental changes to JLS and JVMS.

(Also worth considering, later on, is imputing methods of T
to ValueRef<T>, and conversely methods of Integer to int.
Save that for later when we retcon prims as vals.)

The @ForValueTypesOnly annotation means that it is
probably useless for object types to implement this interface.
The Java compiler should warn or error out if that is attempted.
The JVM could refuse ValueRef as a super on object types
at class load time, if that would add value.

Since ValueRef<T> and T are both subtypes of Object,
it is also possible to use ValueRef<T> as an ad hoc substitute
for T when T itself is a generic type parameter that erases
to Object.  We might be able to pull off stunts like this:

interface Map<K, V> {
  V.BOX get(K key);
}

…where V.BOX erases to V's bound, and instantiates as
ValueRef<V> when V implements ValueRef<V> (tricky, huh?)
and otherwise instantiates as V itself (non-value types, as
today).

Basically, ValueRef<T> can define the special gymnastics we
need for nullable value types, without much special pleading
to the JLS or JVMS, and then T.BOX can map to either T
or ValueRef<T> as needed.  (There's also P.BOX waiting in
the wings, if P is a primitive type.  Maybe we want P.UNBOX.)

Maybe there's a story here.  What's important for JVM-level value
hygiene is that we seem to have our choice of several stories
for dealing with legacy nulls, and that none of our choices forces
us into fine-grained per-descriptor declarations about nullity
or value-ness.

— John

P.S. As a completely different solution, we could make a value type
nullable in an ad hoc, opt-in manner.  This is the one I said "yuck"
about above.  Here FTR is a simple way to do that.

Define a marker interface which says "this value type can be
converted from null".

@ForValueTypesOnly
interface NullConvertsToDefault { }

The main thing this would do, when applied as a super to
a value type Q, is tell the JVM not to throw NPE when casting
to Q, but rather substitute Q's default value.

If LocalDate were to implement this marker interface, then
null-initialized variables (of whatever source) would cast to
LocalDate.default rather than throw NPE.  The methods on
LocalDate could then opt to DTRT, perhaps even throw NPE
for a close emulation of the legacy behavior.

Not many adjustments are needed in the JLS to make this
a workable tool for migration, but comparisons against null
would be necessary:  "ld == null" should not return false,
but rather translate to something like these instructions:

  checkcast LocalDate
  vdefault LocalDate #push LD.default
  invokestatic System.substitutableValues

This translation could be made generic across the NCTD
option if we were willing for "v == null" to throw NPE:

  checkcast LocalDate
  aconst_null; checkcast LocalDate  //NPE or LD.default
  invokestatic System.substitutableValues

At the language level, comparison with an explicit null
would be amended by adding the checkcast:

   //if (ld == null)  //=>
   if (ld == (LocalDate)null)  //either NPE or LD.default

It begins to get messy, does it not?  I think it's a long string
if you pull on it.  Where are all the cut points where null
converts to default?  Should the default be stored back
over the null, ever?  Are null and default distinct values,
or do we try to pretend they are the same?  Should
default ever convert back to null?  Etc., etc.