Value array flattening in L-world

Fri Feb 23 01:25:11 UTC 2018

Nice writeup; thank you!

On Feb 22, 2018, at 8:28 AM, David Simms <david.simms at oracle.com> wrote:
> 
> *** Value array flattening in L-world ***
> 
> There has been some discussion regarding JVMS changes for arrays of values, notably on how to translate ACC_FLATTENABLE used for field declarations in terms of array storage.

In short, the main mission of L-world is to find all the Q's in Q-world and
express them differently.  (So ACC_F expresses Q-ness on a field.)

In some cases where Q-ness was checked statically in Q-world,
it must be checked dynamically in L-world.  So the big question
of L-world is to see whether those dynamic checks can be cashed
(SWIDT?) or whether they will overdraw our optimization account.

> * "Flatten Value Array"
> 
> The initial Valhalla value type prototype assumed all value type arrays are flattened, into 'valueArray' in the JVM (addresses like 'typeArray', but may contain oops). There were some cases where atomicity was required so using 'objArray' for the underlying storage always an option, as long as the semantics were the same.
> 
>     Which semantics ? Well 'aaload' from a new flat array does
+[not]
> return null, but the default value. This is pretty much the same debate over 'getfield' and 'putfield' may be had for ACC_FLATTENABLE for fields.

(Here's a good design heuristic – object : field :: array : element)

>     How flat is flat ? By flat, this means no per element header or state information, i.e. not even enough room to indicate "null" element if that is something required for compatibility (more on that later). There may be methods to achieve this, but they are messy.
> 
> TLDR; Flat arrays don't do null-ability, not even if you optionally needed it.

Note:  This doesn't mean that we are *forbidden* from making non-flat
value arrays, just that it is not in the basic set of VM functionality.

To be clear:  You can make a guaranteed non-flat, nullable value array
of N value by storing then in a "new Object[N]".  You can even get static
typing by wrapping with an ArrayList or something similar.  And such
view-shifting wrappers will be radically cheaper in a world with value types.
So maybe we can get away with *not* having the JVM support non-flat
arrays of value type, since libraries can get the same effect when needed.

> 
> * Can all value arrays be unconditionally flat ?
> 
> Consider legacy code for a user collection, using "<T> T[] elemData;" as storage, given T=EndPoint.class, whereby "EndPoint" is later recompiled as a value:
> 
>     void remove(int index) {
> elemData[index] = null;
>     }
> 
>     boolean isEmpty(int index) {
>         return elemData[index] == null;
>     }
> 
> This example code is fairly ordinary, yet flat array semantics will completely break them, "remove()" will always NPE, and "isEmpty()" always return false.

This is not true, and is a good example of what I said above about
getting nullable arrays of value types, when you need them.  After
translation into a class file, "<T> T[] elemData" becomes "Object[]
elemData", and the code continues to work, even with value types.

> Flat arrays bleed into user code and need to be explicitly opt-in ?

This is a potential risk, but we need better examples.  With today's
generics, it is not a problem, because of erasure.  With tomorrow's
template-based generics, there are interesting problems, too, but
I think they are also soluble (in ways which are outside the scope of
this conversation).

> Using classfile version (as suggested for ACC_FLATTENABLE fields) doesn't work. If new code creates unconditionally flat, what happens if our legacy example has something like this:
> 
>     void setElemData(T[] elemData) {
>         this.elemData = elemData;
>     }
> 
> "this.elemData" has null-ability semantics, callers' arg "elemData" may or may not have.

Yes, this is a better example.  Arrays.asList is such an example.

It's probably helpful to think of it this way:  Value arrays have a
new kind of store-check, which excludes nulls.  The old kind
of store check excludes everything outside of some dynamically
determined type hierarchy.  Here's an example:

  List<?> x = Arrays.asList(new String[1]);
  List<?> y = Arrays.asList(new DoubleComplex[1]);

Both x and y are translated into JVM code as raw lists, whose
get and set methods work on "Ljava/lang/Object;" values.
The actual conformance to String or DoubleComplex (a fictitious
value type) is handled dynamically.  So:

  assert(x.get(0) == null);
  assert(y.get(0) instanceof DoubleComplex);
  assert(y.get(0).equals(DoubleComplex.ZERO));  // new behavior!
  x.set(0, "bar");
  x.set(0, new Object());  // CCE or ASE
  y.set(0, DoubleComplex.ONE);
  y.set(0, new Object());  // CCE or ASE
  y.set(0, null);  // NPE, new behavior!

Many uses of Arrays.asList (and varargs arrays) never update
the contents of the lists or arrays, and so the new NPE behavior
is not an issue, any more than the old CCE/ASE behavior.

(The possibility of such ASE behavior with varargs is why we have
the @SafeVarargs annotations.  It applies also to the new NPEs.)

The net of it is that while Object and interface type arrays can carry
nulls, as can object-type arrays, value-type arrays will refuse nulls,
in circumstances similar to when Object and interface arrays refuse
values of the wrong type.  And since generics use Object and interface
array types (usually) the NPE behavior will occur in the same
circumstances that ASE/CCE behavior can appear today.  Extending
this risk area is not (IMO) a large cost, since we manage it today.

> Then consider interface types, replace the storage from our example (was: T[]) with an interface, e.g. Comparable, and have "EndPoint implements Comparable". Our collection example looks slightly modified:
> 
>     void setElemData(Comparable[] elemData) {
>         this.elemData = elemData;
>     }
> 
> The previously ambiguous "T" parameterized type is clearly an interface, and yes EndPoint[] is "instanceof Comparable[]" (or Object[] for that matter), array covariance still works in L-world. Null-ability will rear its ugly head if you hand it a flat array. One might fool one's self into thinking replacing the parameterized type help things...no difference to Object[] (i.e. current generic type erasure).

Key heuristic for L-world:  "Object" is an honorary interface, as long as
you are willing to ignore its legacy stateful ops (wait, notify, finalize).
So, adding interfaces gets the same story.  Null checks are a new kind
of array store check, which appear when you have very narrow array
types.  When you have generic array types (Object/interface) you have
on the other hand a perhaps surprising ability to store nulls.  And when
we go forward to template classes, the null storage might go away again
for some template instances, since we will probably narrow the arrays.

(For some templates we may elect to keep the generic array type,
perhaps because we might want nullability for some internal reason.)

OK, with all that said, and ignoring the future with templates, where
does that leave us with today's erased generics and their interaction
with non-nullable value types?  I think it's actually a pretty nice place.
If we design things right, the *instances* of generic types will *reject*
nulls if and only if the *type parameter* requires it.  And this will be
true even if the generic internally tolerates nulls.  It requires some
explaining...

It's because of casting:  When I use a generic type like `List<String>`,
all linkage to the erased methods like `get(int)Object` are supplemented
with check-casts that the compiler invisibly adds.  So if I try to
pull an `Integer` out of a list that's typed as `List<String>`, I will
get a runtime error, even if somebody has used unsafe casts
to try to fool me with a list full of integers.

Remember the saying: "codes like a class, works like an int".  If you
use a generic like List to hold primitives, you will (today) use boxes.
A type like `List<Integer>` (the closest we can get to list-of-int)
will be protected by casts so that you can only push and pull
`Integer`s through it.  And you can pretend that it is `List<int>`
to the extent of using auto-box/unbox:

   void frob(List<Integer> ints, int n) {
      int a = ints.get(n);  // cast + unbox
      int b = ints.get(n+1);  // cast + unbox
      ints.set(n+2, a+b);  // box
      ints.set(n+3, null);  // OK, but not a box
   }

If when I load a or b above, and if the List contains a stray
null, I will get an NPE.  If it contains a stray String, I will get
a CCE.  This should look very familiar; it is almost the same
as the behavior we are looking at for flat arrays, except in
reverse.

Now replace Integer with ComplexDouble and observe that
similar code could do the same thing, as long as we ensure
that the cast of a and b (from `List::get(int)Object`) performs
not only a class cast to ComplexDouble but *also* performs
a null check.  Point #1 to remember:  Either checkcast must
throw NPE for nulls on value-type casts, or else the javac
translation strategy must precede the checkcast by a null
check.

What happens if you try to store a null into a list?  Well,
in the case of Integer, it will work, and later when you pull
out the null, you'll get an NPE (assuming you are working
with ints).  That hurts, but we don't do that very much.
The main time this crops up is with `Map::get` used on ints.

With value types something similar can happen:  A type
like `ArrayList<ComplexDouble>` can internally store nulls.
If you extract a null from it, you should get an NPE, just as
if you were playing the unboxing game with ints.  (Remember,
"works like an int".)

A more interesting question is:  Should `ints.set(0,null)`
throw NPE immediately?  The alternative is allowing the list
to be polluted with nulls, which will cause an NPE later on
when an unboxed int or DoubleComplex pulled out.

If we allow nulls even at this point, then the NPE happens
later when we store it into a typed heap variable, so there's
going to be an NPE; we just decide whether it's earlier or
later.  The best practice is to get earlier errors, if we can
tolerate them.  And in this case I think we can.

I think the translation strategy for `List<ComplexDouble>::set`
should perform a null check on its ComplexDouble operand.
This could also be a checkcast, equipped with a null check,
or it could be separately generated as a call to something
like `Objects::requireNonNull`.

The next place this leads me is to think that checkcast should
be used on all type conversion points where a type-instance
of a generic API is being used.  At least, this should be done
if the compiler statically detects that the relevant type is
a value type.  (I think today javac omits such casts if the
erased type is a super of the type-instance parameter type.)

Also, this leads to the question of whether the null check
should be built into checkcast or not.  I'm not sure about
that one.  I suspect we might want vcheckcast which does
a checkcast *and* a null check.  Or maybe a "wide" prefix
version of checkcast which accepts mode bits to control
the exact details of the checking (this could scale to other
types of conversions also).

> * Optional array flattening
> 
> The idea of allowing the user to specify flatten array is an design option. Perhaps the user wishes to build sparse arrays, are we saying you shouldn't use values ? Conversely, why does a user want a flat array, what are the benefits:
> 
>     * Improved memory density: only if mostly occupied, and size is power of two align (implementation detail)
>     * Improved data locality for sequential processing.
> 
> For the price of null-ability, i.e. legacy code breaks.

See above; I think this overstates the problem.  If legacy code
today makes arrays of VBCs (like Optional or LocalDate),
then it will see a change of behavior.  But there are very few
VBCs in the world today, and most arrays are of more generic
types, like Object, or non-VBCs, like String.  And legacy code
which happens to work with Optional has already been instructed
to be wary of nulls.

(BTW, one semi-lame trick we could play is to make anewarray build
nullable arrays in legacy code while the same bytecode builds flat
arrays in new code.  The bytecode just has to check the classfile
version.  Same story maybe with checkcast.  This scheme could
carry us past a problem or two, although it goes wrong when you
start to ask about reflective operations.)

> One option, is to give up on flat arrays as specified by the user, and relegate the concept to runtime optimization (i.e. convert after array marking), but the concurrency issues are not trival, and we'll invalidate all JIT code which needs to change to the new storage.

In-heap conversion is a research problem which won't be solved
any time soon.  Check that one off.

> Give up completely then ? I don't think we are done yet, previous benchmarks have shown some worth in flat arrays.

I'm rooting solidly for flat arrays.  Courage!

> BTW, speaking of JIT code, type speculation had better work, for flat vs non-flat value arrays. If we want to use legacy code we can't go and introduce new type descriptors for flatten arrays. So given the method "foo(EndPoint[] arrayOfValues)", the incoming arg needs different JIT compilation to match the underlying storage addressing. Feels bad. Assume in practice it's not too much of an issue (mixing flat and non-flat arrays in the same method) ?

We'll have to keep an eye on it, but I don't think this "check" is going to
drain our "balance".  Most arrays have non-varying types, even in generic
code.  Type profiles can extract accurate info for the JIT.  In a few places
like `Arrays.toList` there will be varying types; most of those cases will
fold up after profiling and inlining, since the caller of the method will
have a non-varying type for the array.  A bad case is a complex
Stream algorithm or Arrays.sort.  This already a bad case, and
needs a different solution[1][2][3], so I think it does not have a special
bearing on the issue of value types.

[1]: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2012-September/008377.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2012-September/008377.html>
[2]: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-March/017278.html <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-March/017278.html>
[3]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001953.html <http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001953.html>

> 
> * Optional flat arrays impact on the "L-world" spec draft and prototype
> 
> How would one create a flat value array ? Well field declaration doesn't seem to be an option, the interpreter use 'anewarray' followed by 'putfield', there is no way in anewarray to determine the ultimate destination, JVMS only knows "top-of-stack".

I'm not clear on what problem you are solving here.  There are two ways to
organize arrays of value types as separate objects in the heap, as flat and
not-flat-but-nullable.  If we want both (but see above why we might not need
both) then we need two factories.  The factories *might* be two bytecodes,
but one or both might be library methods or even (gulp) method handles.

In fact, since a factory API point is probably the right answer, I think the question
really boils down to, "what should the existing anewarray bytecode do"?
It could logically generate either flat or non-flat (legacy) arrays, and we get
to choose, based on which is most convenient.  I suspect the right answer
is "flat", because that is what "works like an int" does.  When you say
"new int[3]" you don't expect nullability, and the same is true for "new
ComplexDouble[3]".

> So we probably need a new byte-code, along with a few other bits and pieces:

*If* we need non-flat arrays, *then* we need a new factory, which is probably
*not* a new bytecode.

(In the above setup, there's something about storing an array into a field.
I hope you don't mean a field which is a short but flat array of several sub-fields,
because that is not something we are doing in the short term:  As you probably
noticed, that is problematic, since you would have to make it stand-alone and
then store it down into an object, and how did the object know how long the
array was in the first place?  I'm pretty sure we can address this long term,
but not in the current round of problems.)

>     * anewflatarray: new bytecode, like anewarray, only flatter
Or, "anewnonflatarray".  Better yet, "Arrays.newReferenceArray".

>     * aaload: JVMS would need to note if array is flat, the result is never null (defaultvalue)

Yes.  This is a "check" which will cost us something.  The interpreter
needs to detect a value-array, and the JIT needs help from profiling
and inlining.

>     * aastore: draft already throws NPE, previous comment on this list might suggest NPE only if classfile version is old ? Yet flat arrays don't have a choice

If flat arrays don't have a choice, then NPE is the humane thing to do.

>     * "System.arraycopy()": is going to need more work to be able to copy say "ComparableValue[]" into "Comparable[]", and this will be an expensive operation if mixing flat and non-flat, i.e. to convert storage: allocate and copy in each element.

Yes, this is tricky in the presence of flat arrays.  So, for that matter, is GC,
if the arrays are a mix of refs and non-refs.

>     * Classic JDK collections like ArrayList use "Object[] elementData", can't currently benefit from flat arrays (see "ArrayList.remove()"), needs null-ability.

See analysis above.  Summary: Generics that make their own arrays are unaffected,
generics which operate on user-supplied arrays have a new store check, translation
strategy for generic-over-value-type should arrange for timely NPEs even if erased
type supports nulls.

Another place where the "flat bit" must be checked is in the reflective
machinery that touches arrays, notably `java.lang.reflect.Array`, its
factory, getter, and setter.

> Comments, alternatives, corrections are welcome, there are a number of unanswered questions...
> 
> Note there are a number of problems here closely related to the concept of "Frozen Arrays" which would like use the same type descriptor, but follow a slightly different rule book...so some conclusions may help there.

Yes.  A frozen array has a store check which is particularly draconian:  It always
throws.  (We'll have a separate email chain for that one.)

One more point:  In order to fulfill the promise of "works like an int", we must
drive towards a future where even the primitives are ret-conned as value types.
This is another reason to appeal to primitive array types like "int[]" when
reasoning about the required behavior of value-type arrays like "DoubleComplex[]".
Eventually "int[]" will be a subtype of "Object[]", and the properties of flatness
and non-nullability of "int[]" and "DoubleComplex[]" should be virtually identical,
at that point.  Let's aim for it.

— John