a new contract for reference types

Mon May 13 19:59:14 UTC 2019

It is great to continue to read all these fascinating discussions on
value types and how they could play out in code. I just wanted to add
my brief thoughts on V? as a type.

As background, I'm not a fan of null in general and our codebase
studiously avoids it. But I am a fan of types like T? where T is
non-null and T? is nullable (like Fantom and Kotlin). However, Java
has a long history, and types in Java cannot have those meanings.
Given that, the use of V? seems odd to someone who should be excited
by it.

The email below helped explain my discomfort (thanks again). It seems
clear that the primary thing the type is trying to express is the
pointer (indirection), not the nullability. And as such, V? is
entirely the wrong name for it. I don't have a particularly strong
opinion on the alternative - V*, V.ptr, V.ref etc, although I don't
think that generics style Ptr<V> syntax captures the concept that
well. Similarly, I don't have a strong opinion at this point on
whether there should be a subtyping relationship or not.

I do think there might be a case for the pointer syntax to apply to
all types though. ie. that `String.ptr` would be an equivalent type to
`String`, but `Int128.ptr` would not be the same type as `Int128`.

So, the key fact is that V? isn't really about nullability at all -
thats a secondary effect. Focussing the type name on the aspect which
is primary (pointer-ness) makes it a lot more understandable (and of
course a pointer is nullable today, so that shouldn't confuse).

thanks
Stephen

On Wed, 1 May 2019 at 23:30, John Rose <john.r.rose at oracle.com> wrote:
>
> On May 1, 2019, at 7:39 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> >
> > This is the point I’ve been failing to get across to Remi as well.  Suppose you have a big value, and you want a sparse ArrayList of that value?  You need a way to say “Arraylist of BigValue, but not flattened.”  And that’s a use-site concern.  ArrayList<BigValue?> gives you that.
> >
> > Which is to say: the claim that V? is useless once we have reified generics is simply untrue.
>
> Uses of value type (and templates, after that) have a new
> feature, not seen before on the JVM.  They pull information
> from the definition (the class-file) and employ it at the use
> site.  This fact has subtle and disruptive effects on our
> JVM and user model.  I'll take a crack at explaining…
>
> Today's types in the JVM are either hardwired primitives,
> or else references.  The references do *not* pull information
> from the definition, unless the operation specifically requests
> it (via a resolution step, as in a `new` or `checkcast` instruction).
> In many usages, our existing reference types work like forward
> declarations in C:
>
>     typedef struct Foo Foo; //"class Foo;" in C++
>     extern Foo* makeFoo(const char* name);
>     extern Foo* processFoo(Foo* arg);
>
> I'll use the acronym *FDT* for "forward declared type".
>
> Whole APIs can can be defined using such FDTs in C.  Eventually
> somebody needs to make a Foo, but that can be in a small
> subset of the application which actually loads a header file
> that defines the body of `struct Foo`.
>
> In physical terms, FDTs are obviously nullable because they
> use an indirection pointer, which can assume a sentinel value.
> This allows variables of FDT type (like `Foo*`) to be initialized
> safely without the FDT's definition (such as a visible body for
> `struct Foo`).
>
> FDTs are not globally optimizable; at most they can be locally
> optimized by rewriting a region of the program.  This applies
> in general, and specifically to Java's reference types today.
> Valhalla disrupts this state of affairs, in mostly good ways.
>
> If you have an array of values for a FDT, you must have an array
> of pointers.  It's impossible to secretly inline some of these arrays
> because you are clever about knowing the source of them, except
> in very limited case where local escape analysis lets you rewrite
> the program secretly.  (Such technology has not been invented,
> despite a half century of trying.  I'll call that "impossible" for short.)
> Every mature OOL has a user switch to let the user choose,
> non-secretly, whether to use an optimized representation or an
> abstract one.  For Java, it's `int[]` vs. `Integer[]`.
>
> It's not just arrays either.  If you want an optimized field, you
> have to use a different contract than the FDTs.  For Java, it's
> `int f` vs `Integer f`, with the definition of `int` hardwired
> for your benefit.
>
> The fact that Integer is a FDT at this point is invisible to users,
> since it's built in (and in fact the JVM loads it really early).
> But if you were to make a user defined primitive `uint`,
> then suddenly you'd have to load the class-file for `uint`
> before laying out a field of that type.  Or you'd have to
> use an indirection.
>
> So forward declarations can be used, with indirections, to
> lay out fields and arrays abstractly.  Conversely, non-abstract
> layouts (without indirections) cannot be used with forward
> declarations.  (Side note:  You could include the full layout
> redundantly at every use site.  We're not doing that; it would
> break Java's core binary compatibility model.)
>
> This is a deep reason why Java has nulls.  If you declare a
> reference variable in Java, it is a FDT (although you might not
> realized it).  To initialize that variable, you need a value which
> is available before the type is defined.  Hello, null.
>
> (Java could have chosen to omit null, I suppose, but then
> every field load would potentially throw a "field not initialized"
> exception.  The larger point is that variables of forward-declared
> type need a state which does not depend on the type's definition.)
>
> And it's not just arrays and fields either.  It turns out to be
> infeasible to arrange efficient calling sequences for virtual
> method hierarchies unless those hierarchies are loaded under
> a contract which allows full knowledge of the definitions of
> argument and return types in those hierarchies, *if* those
> argument and return types are to be optimized as part
> of the calling sequence.  (There are lots of 80% and 90%
> solutions to this, but the 100% solution involves asking
> the user to accept a new contract for the types that are
> to be optimized.  Spoiler:  We intend that this new contract
> will be the default for value types V, while V? makes use
> of the old contract.)
>
> For this reason, when we change the performance characteristics
> of Java types by making some "inline" (aka "value" or "immediate"),
> we also require a distinction between fully abstract occurrences
> of such optimized values (using the old contract) and normally
> optimized occurrences, using a new contract.
>
> What is the new contract?  Here are the essentials:
>
> 1. An occurrence of the optimized type in a descriptor causes
> resolution.   This applies to field, method, array, and method types.
> 1a. There may be side effects due to class-file loading.
> 1b. There may be circularity errors due to loading of ill-formed programs.
> 1c. In the case of templates, there may be side effects due
> to template expansion logic (including bootstrap methods).
>
> 2. The optimized type does not necessarily permit null values.
> 2a. After resolving the type, validity of null is determined
> unambiguously from the type's definition, not its use.
>
> 3. Uses of an optimized type aggressively employ the details of
> that type as found in its class-file (or template expansion).
> 3a. In most cases this entails aggressive inlining.
> 3b. The only reliable way to turn off aggressive inlining is
> to request the old contract, by using the old descriptor.
>
> I think what's going on behind the questions about V and V?
> is the emergence of this new contract, and the clarification
> of the old contract.  The contract differences can be hidden
> in many cases, but not all.  Users will expect the new contract
> in cases like the following:
>
>   - when making a field or (in most cases) an array
>   - when passing values to and from methods
>
> In most cases where the type is resolved, the old and new
> contracts don't differ.  These include:
>
>   - making an instance
>   - type casting or testing
>   - reflection
>
> In some cases, users may wish for the old contract.
> Such cases include:
>
>   - nullability (if the type doesn't natively support null)
>   - layout polymorphism based on indirections (old generics)
>   - sparse arrays of multi-word types
>   - descriptor equivalence (avoidance of bridges)
>   - copy avoidance (cf. Doug's note)
>
> We are presently using the type name V? to refer to the
> old contract.  This means that V? could serve any of the
> above use cases.  Although it seems unnatural for copy
> avoidance, but natural enough for the others.
>
> Thus, the deep difference of V and V?, or V.val and V.box,
> amounts to "V considered natively" and "V considered as
> a FDT".  This is very like the distinction in C++ between
> `const T&` vs. `T`.  And like C++ we might examine
> reasons to add further distinctions, such as `const T&&`
> and `T&` and (nullable) `T*` or `const T*`.  Each of
> those variations of `T` in C++ have distinct contracts
> in the C++ user model.  Java surely won't have so many
> distinctions; after all it differs from C++ by moving
> many representational choice from compile-time
> specifications into the JVM's runtime.
>
> I hope we only need two basic contracts for the Valhalla
> user model, the old one (necessary at least for backward
> compatibility) and one new one (sketched above).  Doug's
> note raises a possible need for a third contract, which I
> hope we can avoid.  I think V? can help in Doug's examples
> to force an indirection, which (in most cases but not all)
> can be strength-reduced to something off the heap.
>
> Doug's note has a comforting observation that such fine
> tuning is most important at large scales, and that there
> are several tactics which can be deployed there.  The
> job of choosing tactics can sometimes be deferred to
> the library code, rather than the user.  For example,
> sometimes the right answer for sorting a large flat
> array will be to first sort a temporary array of indexes
> into that array, and then permute the array.  Such a
> tactical decision doesn't need to show up in the user
> model of the sort function.
>
> Another ray of hope: Al though the JIT cannot rewrite
> data structure, since it operates after data is defined,
> it locally rewrites large amounts of program structure,
> including the formats of method arguments and returns.
> (This assumes successful de-virtualization which is
> often the case, more often than successful escape
> analysis that allows data to be reformatted.)
> Indirections are not important to the JIT, since
> the JIT can always just use the original definition.
>
> Anyway, it's clear to me we need to nail down the new
> contract, and (in hindsight) elucidate the old one.
>
> It also seems to me that we are OK tying the new contract
> to V (the natural contract for the natural name) and the
> old one to V? (meaning nullable, but also indirect).
>
> Regarding subtyping, I don't see (from these considerations)
> a firm reason to declare that V? is a super of V.  The value
> set of V? *might* have one more point than that of V,
> or it *might not*.  The reason we are doing V? is not the
> value set, but the whole contract, which includes the
> value set as an obvious, but ultimately non-determinative part.
>
> I suppose if we were to spell V? as V* it would be clearer
> what we mean, relative to C and C++.  But V? is fine by
> me, as long as we use Java's tradition of "lump instead of
> split", and be firm that V? lumps together nullability
> (*guaranteed* as opposed to only *when natural*) plus
> other aspects, notably backward compatibility, and
> sparse random order (not dense linear order) in arrays.
>
> — John