Value array flattening in L-world

Thu Feb 22 16:28:39 UTC 2018

*** Value array flattening in L-world ***

There has been some discussion regarding JVMS changes for arrays of 
values, notably on how to translate ACC_FLATTENABLE used for field 
declarations in terms of array storage.

* "Flatten Value Array"

The initial Valhalla value type prototype assumed all value type arrays 
are flattened, into 'valueArray' in the JVM (addresses like 'typeArray', 
but may contain oops). There were some cases where atomicity was 
required so using 'objArray' for the underlying storage always an 
option, as long as the semantics were the same.

     Which semantics ? Well 'aaload' from a new flat array does return 
null, but the default value. This is pretty much the same debate over 
'getfield' and 'putfield' may be had for ACC_FLATTENABLE for fields.

     How flat is flat ? By flat, this means no per element header or 
state information, i.e. not even enough room to indicate "null" element 
if that is something required for compatibility (more on that later). 
There may be methods to achieve this, but they are messy.

TLDR; Flat arrays don't do null-ability, not even if you optionally 
needed it.

* Can all value arrays be unconditionally flat ?

Consider legacy code for a user collection, using "<T> T[] elemData;" as 
storage, given T=EndPoint.class, whereby "EndPoint" is later recompiled 
as a value:

     void remove(int index) {
elemData[index] = null;
     }

     boolean isEmpty(int index) {
         return elemData[index] == null;
     }

This example code is fairly ordinary, yet flat array semantics will 
completely break them, "remove()" will always NPE, and "isEmpty()" 
always return false. Flat arrays bleed into user code and need to be 
explicitly opt-in ?

Using classfile version (as suggested for ACC_FLATTENABLE fields) 
doesn't work. If new code creates unconditionally flat, what happens if 
our legacy example has something like this:

     void setElemData(T[] elemData) {
         this.elemData = elemData;
     }

"this.elemData" has null-ability semantics, callers' arg "elemData" may 
or may not have.

Then consider interface types, replace the storage from our example 
(was: T[]) with an interface, e.g. Comparable, and have "EndPoint 
implements Comparable". Our collection example looks slightly modified:

     void setElemData(Comparable[] elemData) {
         this.elemData = elemData;
     }

The previously ambiguous "T" parameterized type is clearly an interface, 
and yes EndPoint[] is "instanceof Comparable[]" (or Object[] for that 
matter), array covariance still works in L-world. Null-ability will rear 
its ugly head if you hand it a flat array. One might fool one's self 
into thinking replacing the parameterized type help things...no 
difference to Object[] (i.e. current generic type erasure).

* Optional array flattening

The idea of allowing the user to specify flatten array is an design 
option. Perhaps the user wishes to build sparse arrays, are we saying 
you shouldn't use values ? Conversely, why does a user want a flat 
array, what are the benefits:

     * Improved memory density: only if mostly occupied, and size is 
power of two align (implementation detail)
     * Improved data locality for sequential processing.

For the price of null-ability, i.e. legacy code breaks.

One option, is to give up on flat arrays as specified by the user, and 
relegate the concept to runtime optimization (i.e. convert after array 
marking), but the concurrency issues are not trival, and we'll 
invalidate all JIT code which needs to change to the new storage.Give up 
completely then ? I don't think we are done yet, previous benchmarks 
have shown some worth in flat arrays.

BTW, speaking of JIT code, type speculation had better work, for flat vs 
non-flat value arrays. If we want to use legacy code we can't go and 
introduce new type descriptors for flatten arrays. So given the method 
"foo(EndPoint[] arrayOfValues)", the incoming arg needs different JIT 
compilation to match the underlying storage addressing. Feels bad. 
Assume in practice it's not too much of an issue (mixing flat and 
non-flat arrays in the same method) ?

* Optional flat arrays impact on the "L-world" spec draft and prototype

How would one create a flat value array ? Well field declaration doesn't 
seem to be an option, the interpreter use 'anewarray' followed by 
'putfield', there is no way in anewarray to determine the ultimate 
destination, JVMS only knows "top-of-stack". So we probably need a new 
byte-code, along with a few other bits and pieces:

     * anewflatarray: new bytecode, like anewarray, only flatter
     * aaload: JVMS would need to note if array is flat, the result is 
never null (defaultvalue)
     * aastore: draft already throws NPE, previous comment on this list 
might suggest NPE only if classfile version is old ? Yet flat arrays 
don't have a choice
     * "System.arraycopy()": is going to need more work to be able to 
copy say "ComparableValue[]" into "Comparable[]", and this will be an 
expensive operation if mixing flat and non-flat, i.e. to convert 
storage: allocate and copy in each element.
     * Classic JDK collections like ArrayList use "Object[] 
elementData", can't currently benefit from flat arrays (see 
"ArrayList.remove()"), needs null-ability.

Comments, alternatives, corrections are welcome, there are a number of 
unanswered questions...

Note there are a number of problems here closely related to the concept 
of "Frozen Arrays" which would like use the same type descriptor, but 
follow a slightly different rule book...so some conclusions may help there.

Cheers
/David Simms

PS: vacation next week, if you don't hear from me, that's why.