Value array flattening in L-world

Fri Feb 23 14:44:56 UTC 2018

On 23/02/2018 2:25 a.m., John Rose wrote:
> Nice writeup; thank you!
>
> On Feb 22, 2018, at 8:28 AM, David Simms <david.simms at oracle.com 
> <mailto:david.simms at oracle.com>> wrote:
>>
>> *** Value array flattening in L-world ***
>>
>> There has been some discussion regarding JVMS changes for arrays of 
>> values, notably on how to translate ACC_FLATTENABLE used for field 
>> declarations in terms of array storage.
>
> In short, the main mission of L-world is to find all the Q's in 
> Q-world and
> express them differently.  (So ACC_F expresses Q-ness on a field.)
>
> In some cases where Q-ness was checked statically in Q-world,
> it must be checked dynamically in L-world.  So the big question
> of L-world is to see whether those dynamic checks can be cashed
> (SWIDT?) or whether they will overdraw our optimization account.
>

SWIDT ? At first no, because I got side tracked into a NZ hip-hop group 
(called SWIDT) whose lyrics are NSFW. So, I thank you sir !

For array dynamic checks, I'm not too concerned for aastore since 
existing sub-type checking already deref klass, aaload will be somewhat 
protected by fast "is_value()"...and then JIT type speculation will 
hopeful save us...guess we will find out so enough.

>> * "Flatten Value Array"
>>
>> The initial Valhalla value type prototype assumed all value type 
>> arrays are flattened, into 'valueArray' in the JVM (addresses like 
>> 'typeArray', but may contain oops). There were some cases where 
>> atomicity was required so using 'objArray' for the underlying storage 
>> always an option, as long as the semantics were the same.
>>
>>     Which semantics ? Well 'aaload' from a new flat array does
> +[not]
>> return null, but the default value. This is pretty much the same 
>> debate over 'getfield' and 'putfield' may be had for ACC_FLATTENABLE 
>> for fields.
>

Thanks, shame you can't edit after pushing send, "aaload does not return 
null" in this flat case.

> (Here's a good design heuristic – object : field :: array : element)
>
>>     How flat is flat ? By flat, this means no per element header or 
>> state information, i.e. not even enough room to indicate "null" 
>> element if that is something required for compatibility (more on that 
>> later). There may be methods to achieve this, but they are messy.
>>
>> TLDR; Flat arrays don't do null-ability, not even if you optionally 
>> needed it.
>
> Note:  This doesn't mean that we are *forbidden* from making non-flat
> value arrays, just that it is not in the basic set of VM functionality.
>
> To be clear:  You can make a guaranteed non-flat, nullable value array
> of N value by storing then in a "new Object[N]".  You can even get static
> typing by wrapping with an ArrayList or something similar.  And such
> view-shifting wrappers will be radically cheaper in a world with value 
> types.
> So maybe we can get away with *not* having the JVM support non-flat
> arrays of value type, since libraries can get the same effect when needed.
>

Correct, the current ArrayList "Object[] elementData" works out of the 
box. It's not flat, there is no data locality or header savings in the 
array storage itself. We would expect no regressions, and possibly some 
improvements due to lack of identity, e.g.: aaload can use thread local 
value buffering.

>>
>> * Can all value arrays be unconditionally flat ?
>>
>> Consider legacy code for a user collection, using "<T> T[] elemData;" 
>> as storage, given T=EndPoint.class, whereby "EndPoint" is later 
>> recompiled as a value:
>>
>>     void remove(int index) {
>>         elemData[index] = null;
>>     }
>>
>>     boolean isEmpty(int index) {
>>         return elemData[index] == null;
>>     }
>>
>> This example code is fairly ordinary, yet flat array semantics will 
>> completely break them, "remove()" will always NPE, and "isEmpty()" 
>> always return false.
>
> This is not true, and is a good example of what I said above about
> getting nullable arrays of value types, when you need them.  After
> translation into a class file, "<T> T[] elemData" becomes "Object[]
> elemData", and the code continues to work, even with value types.
>

You're right. Then if we translate "T" to be specifically "EndPoint", we 
are back to saying, well this is typed and new obvious at compile time, 
so we can warn the user when they add "__VALUE__" keyword and compile 
against any existing code. Then there is the case where said legacy 
wasn't included in a compile time, but at least it is clear from the 
signature which type we are talking about. There may be user advice to 
avoid this, but static checking legacy code for potential problems (e.g. 
jlink plugin). Is this good enough ? Or am I giving us a pass on this ?

>> Flat arrays bleed into user code and need to be explicitly opt-in ?
>
> This is a potential risk, but we need better examples.  With today's
> generics, it is not a problem, because of erasure.  With tomorrow's
> template-based generics, there are interesting problems, too, but
> I think they are also soluble (in ways which are outside the scope of
> this conversation).

Agreed. New code knows the new rules...so I guess I should offer 
concrete examples of things that might be surprising to 
new-comers...will come back with that at a late date.

>
>> Using classfile version (as suggested for ACC_FLATTENABLE fields) 
>> doesn't work. If new code creates unconditionally flat, what happens 
>> if our legacy example has something like this:
>>
>>     void setElemData(T[] elemData) {
>>         this.elemData = elemData;
>>     }
>>
>> "this.elemData" has null-ability semantics, callers' arg "elemData" 
>> may or may not have.
>
> Yes, this is a better example.  Arrays.asList is such an example.
>
> It's probably helpful to think of it this way:  Value arrays have a
> new kind of store-check, which excludes nulls.  The old kind
> of store check excludes everything outside of some dynamically
> determined type hierarchy.  Here's an example:
>
>   List<?> x = Arrays.asList(new String[1]);
>   List<?> y = Arrays.asList(new DoubleComplex[1]);
>
> Both x and y are translated into JVM code as raw lists, whose
> get and set methods work on "Ljava/lang/Object;" values.
> The actual conformance to String or DoubleComplex (a fictitious
> value type) is handled dynamically.  So:
>
>   assert(x.get(0) == null);
>   assert(y.get(0) instanceof DoubleComplex);
>   assert(y.get(0).equals(DoubleComplex.ZERO));  // new behavior!
>   x.set(0, "bar");
>   x.set(0, new Object());  // CCE or ASE
>   y.set(0, DoubleComplex.ONE);
>   y.set(0, new Object());  // CCE or ASE
>   y.set(0, null);  // NPE, new behavior!
>

Looks good, conforms with my understanding at least.

> Many uses of Arrays.asList (and varargs arrays) never update
> the contents of the lists or arrays, and so the new NPE behavior
> is not an issue, any more than the old CCE/ASE behavior.
>
> (The possibility of such ASE behavior with varargs is why we have
> the @SafeVarargs annotations.  It applies also to the new NPEs.)
>
> The net of it is that while Object and interface type arrays can carry
> nulls, as can object-type arrays, value-type arrays will refuse nulls,
> in circumstances similar to when Object and interface arrays refuse
> values of the wrong type.  And since generics use Object and interface
> array types (usually) the NPE behavior will occur in the same
> circumstances that ASE/CCE behavior can appear today.  Extending
> this risk area is not (IMO) a large cost, since we manage it today.
>
>> Then consider interface types, replace the storage from our example 
>> (was: T[]) with an interface, e.g. Comparable, and have "EndPoint 
>> implements Comparable". Our collection example looks slightly modified:
>>
>>     void setElemData(Comparable[] elemData) {
>>         this.elemData = elemData;
>>     }
>>
>> The previously ambiguous "T" parameterized type is clearly an 
>> interface, and yes EndPoint[] is "instanceof Comparable[]" (or 
>> Object[] for that matter), array covariance still works in L-world. 
>> Null-ability will rear its ugly head if you hand it a flat array. One 
>> might fool one's self into thinking replacing the parameterized type 
>> help things...no difference to Object[] (i.e. current generic type 
>> erasure).
>
> Key heuristic for L-world:  "Object" is an honorary interface, as long as
> you are willing to ignore its legacy stateful ops (wait, notify, 
> finalize).
> So, adding interfaces gets the same story.  Null checks are a new kind
> of array store check, which appear when you have very narrow array
> types.  When you have generic array types (Object/interface) you have
> on the other hand a perhaps surprising ability to store nulls.  And when
> we go forward to template classes, the null storage might go away again
> for some template instances, since we will probably narrow the arrays.

Clearly there is a use case for sharp typed collections built with 
non-null-ability in mind. "Collections.unmodified*()" might be interesting.

>
> (For some templates we may elect to keep the generic array type,
> perhaps because we might want nullability for some internal reason.)
>
> OK, with all that said, and ignoring the future with templates, where
> does that leave us with today's erased generics and their interaction
> with non-nullable value types?  I think it's actually a pretty nice place.
> If we design things right, the *instances* of generic types will *reject*
> nulls if and only if the *type parameter* requires it.  And this will be
> true even if the generic internally tolerates nulls.  It requires some
> explaining...
>
> It's because of casting:  When I use a generic type like `List<String>`,
> all linkage to the erased methods like `get(int)Object` are supplemented
> with check-casts that the compiler invisibly adds.  So if I try to
> pull an `Integer` out of a list that's typed as `List<String>`, I will
> get a runtime error, even if somebody has used unsafe casts
> to try to fool me with a list full of integers.
>
> Remember the saying: "codes like a class, works like an int".  If you
> use a generic like List to hold primitives, you will (today) use boxes.
> A type like `List<Integer>` (the closest we can get to list-of-int)
> will be protected by casts so that you can only push and pull
> `Integer`s through it.  And you can pretend that it is `List<int>`
> to the extent of using auto-box/unbox:
>
>    void frob(List<Integer> ints, int n) {
>       int a = ints.get(n);  // cast + unbox
>       int b = ints.get(n+1);  // cast + unbox
>       ints.set(n+2, a+b);  // box
>       ints.set(n+3, null);  // OK, but not a box
>    }
>
> If when I load a or b above, and if the List contains a stray
> null, I will get an NPE.  If it contains a stray String, I will get
> a CCE.  This should look very familiar; it is almost the same
> as the behavior we are looking at for flat arrays, except in
> reverse.
>
> Now replace Integer with ComplexDouble and observe that
> similar code could do the same thing, as long as we ensure
> that the cast of a and b (from `List::get(int)Object`) performs
> not only a class cast to ComplexDouble but *also* performs
> a null check.  Point #1 to remember:  Either checkcast must
> throw NPE for nulls on value-type casts, or else the javac
> translation strategy must precede the checkcast by a null
> check.
>
> What happens if you try to store a null into a list?  Well,
> in the case of Integer, it will work, and later when you pull
> out the null, you'll get an NPE (assuming you are working
> with ints).  That hurts, but we don't do that very much.
> The main time this crops up is with `Map::get` used on ints.
>
> With value types something similar can happen:  A type
> like `ArrayList<ComplexDouble>` can internally store nulls.
> If you extract a null from it, you should get an NPE, just as
> if you were playing the unboxing game with ints.  (Remember,
> "works like an int".)
>
> A more interesting question is:  Should `ints.set(0,null)`
> throw NPE immediately?  The alternative is allowing the list
> to be polluted with nulls, which will cause an NPE later on
> when an unboxed int or DoubleComplex pulled out.
>
> If we allow nulls even at this point, then the NPE happens
> later when we store it into a typed heap variable, so there's
> going to be an NPE; we just decide whether it's earlier or
> later.  The best practice is to get earlier errors, if we can
> tolerate them.  And in this case I think we can.
>

Okay, I'm sold on the NPE on store up front. Others may have different 
opinions :-)

> I think the translation strategy for `List<ComplexDouble>::set`
> should perform a null check on its ComplexDouble operand.
> This could also be a checkcast, equipped with a null check,
> or it could be separately generated as a call to something
> like `Objects::requireNonNull`.
>
> The next place this leads me is to think that checkcast should
> be used on all type conversion points where a type-instance
> of a generic API is being used.  At least, this should be done
> if the compiler statically detects that the relevant type is
> a value type.  (I think today javac omits such casts if the
> erased type is a super of the type-instance parameter type.)
>
> Also, this leads to the question of whether the null check
> should be built into checkcast or not.  I'm not sure about
> that one.  I suspect we might want vcheckcast which does
> a checkcast *and* a null check.  Or maybe a "wide" prefix
> version of checkcast which accepts mode bits to control
> the exact details of the checking (this could scale to other
> types of conversions also).

Checkcast performing null check for value class, seems reasonable. Does 
break the fast path you get with nulls...may be better javac inserts the 
null check ?

>
>> * Optional array flattening
>>
>> The idea of allowing the user to specify flatten array is an design 
>> option. Perhaps the user wishes to build sparse arrays, are we saying 
>> you shouldn't use values ? Conversely, why does a user want a flat 
>> array, what are the benefits:
>>
>>     * Improved memory density: only if mostly occupied, and size is 
>> power of two align (implementation detail)
>>     * Improved data locality for sequential processing.
>>
>> For the price of null-ability, i.e. legacy code breaks.
>
> See above; I think this overstates the problem.  If legacy code
> today makes arrays of VBCs (like Optional or LocalDate),
> then it will see a change of behavior.  But there are very few
> VBCs in the world today, and most arrays are of more generic
> types, like Object, or non-VBCs, like String.  And legacy code
> which happens to work with Optional has already been instructed
> to be wary of nulls.
>
> (BTW, one semi-lame trick we could play is to make anewarray build
> nullable arrays in legacy code while the same bytecode builds flat
> arrays in new code.  The bytecode just has to check the classfile
> version.  Same story maybe with checkcast.  This scheme could
> carry us past a problem or two, although it goes wrong when you
> start to ask about reflective operations.)
>
>> One option, is to give up on flat arrays as specified by the user, 
>> and relegate the concept to runtime optimization (i.e. convert after 
>> array marking), but the concurrency issues are not trival, and we'll 
>> invalidate all JIT code which needs to change to the new storage.
>
> In-heap conversion is a research problem which won't be solved
> any time soon.  Check that one off.
>
>> Give up completely then ? I don't think we are done yet, previous 
>> benchmarks have shown some worth in flat arrays.
>
> I'm rooting solidly for flat arrays.  Courage!

I need to look for more holes in your argument for non-null-ability. I 
encourage others to do so too :)

>
>> BTW, speaking of JIT code, type speculation had better work, for flat 
>> vs non-flat value arrays. If we want to use legacy code we can't go 
>> and introduce new type descriptors for flatten arrays. So given the 
>> method "foo(EndPoint[] arrayOfValues)", the incoming arg needs 
>> different JIT compilation to match the underlying storage addressing. 
>> Feels bad. Assume in practice it's not too much of an issue (mixing 
>> flat and non-flat arrays in the same method) ?
>
> We'll have to keep an eye on it, but I don't think this "check" is 
> going to
> drain our "balance".  Most arrays have non-varying types, even in generic
> code.  Type profiles can extract accurate info for the JIT.  In a few 
> places
> like `Arrays.toList` there will be varying types; most of those cases will
> fold up after profiling and inlining, since the caller of the method will
> have a non-varying type for the array.  A bad case is a complex
> Stream algorithm or Arrays.sort.  This already a bad case, and
> needs a different solution[1][2][3], so I think it does not have a special
> bearing on the issue of value types.
>
> [1]: 
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2012-September/008377.html
> [2]: 
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-March/017278.html
> [3]: 
> http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001953.html
>
>>
>> * Optional flat arrays impact on the "L-world" spec draft and prototype
>>
>> How would one create a flat value array ? Well field declaration 
>> doesn't seem to be an option, the interpreter use 'anewarray' 
>> followed by 'putfield', there is no way in anewarray to determine the 
>> ultimate destination, JVMS only knows "top-of-stack".
>
> I'm not clear on what problem you are solving here.  There are two ways to
> organize arrays of value types as separate objects in the heap, as 
> flat and
> not-flat-but-nullable.  If we want both (but see above why we might 
> not need
> both) then we need two factories.  The factories *might* be two bytecodes,
> but one or both might be library methods or even (gulp) method handles.

I was speculating:

A) living in a world with only flat value arrays:
     I thought the compatibility problems were too great. But type 
erasure and Object[] of values weirdly saves the day there.

B) living in a world without flat value arrays at all:
     Have courage, and dropping them feels like project failure (i.e. 
see you later comment on primitives)

C) living in a world where where flat and non-flat live side by side, 
possibly selected by the user

However, it looks like what we have is option A), only flat value 
arrays; additional note that Object[] are valid containers for values 
('cause L-world)...so we are good.

>
> In fact, since a factory API point is probably the right answer, I 
> think the question
> really boils down to, "what should the existing anewarray bytecode do"?
> It could logically generate either flat or non-flat (legacy) arrays, 
> and we get
> to choose, based on which is most convenient.  I suspect the right answer
> is "flat", because that is what "works like an int" does.  When you say
> "new int[3]" you don't expect nullability, and the same is true for "new
> ComplexDouble[3]".
>
>> So we probably need a new byte-code, along with a few other bits and 
>> pieces:
>
> *If* we need non-flat arrays, *then* we need a new factory, which is 
> probably
> *not* a new bytecode.
>
> (In the above setup, there's something about storing an array into a 
> field.
> I hope you don't mean a field which is a short but flat array of 
> several sub-fields,
> because that is not something we are doing in the short term:  As you 
> probably
> noticed, that is problematic, since you would have to make it 
> stand-alone and
> then store it down into an object, and how did the object know how 
> long the
> array was in the first place?  I'm pretty sure we can address this 
> long term,
> but not in the current round of problems.)
>
>>     * anewflatarray: new bytecode, like anewarray, only flatter
> Or, "anewnonflatarray".  Better yet, "Arrays.newReferenceArray".
>

Or not required until proven otherwise. I'd move to strike this first point.

>>     * aaload: JVMS would need to note if array is flat, the result is 
>> never null (defaultvalue)
>
> Yes.  This is a "check" which will cost us something.  The interpreter
> needs to detect a value-array, and the JIT needs help from profiling
> and inlining.
>
>>     * aastore: draft already throws NPE, previous comment on this 
>> list might suggest NPE only if classfile version is old ? Yet flat 
>> arrays don't have a choice
>
> If flat arrays don't have a choice, then NPE is the humane thing to do.
>

Yeap, convinced.

>>     * "System.arraycopy()": is going to need more work to be able to 
>> copy say "ComparableValue[]" into "Comparable[]", and this will be an 
>> expensive operation if mixing flat and non-flat, i.e. to convert 
>> storage: allocate and copy in each element.
>
> Yes, this is tricky in the presence of flat arrays.  So, for that 
> matter, is GC,
> if the arrays are a mix of refs and non-refs.
>

In the GC case, I'm still worried about mostly default value arrays 
wasting time with oop iteration...but there may be interesting 
optimizations for later.

Which reminds me, with JFR being open sourced, trace events (with 
duration thresholds) may be a useful for value implementation 
diagnostics, i.e. might even be useful to the user: "massive array copy 
from flat to Object[] takes time, maybe I should use the sharpened 
collection". Answer implementation questions, like, how bad could 
approach <X> be.

>>     * Classic JDK collections like ArrayList use "Object[] 
>> elementData", can't currently benefit from flat arrays (see 
>> "ArrayList.remove()"), needs null-ability.
>
> See analysis above.  Summary: Generics that make their own arrays are 
> unaffected,
> generics which operate on user-supplied arrays have a new store check, 
> translation
> strategy for generic-over-value-type should arrange for timely NPEs 
> even if erased
> type supports nulls.
>
> Another place where the "flat bit" must be checked is in the reflective
> machinery that touches arrays, notably `java.lang.reflect.Array`, its
> factory, getter, and setter.
>
>> Comments, alternatives, corrections are welcome, there are a number 
>> of unanswered questions...
>>
>> Note there are a number of problems here closely related to the 
>> concept of "Frozen Arrays" which would like use the same type 
>> descriptor, but follow a slightly different rule book...so some 
>> conclusions may help there.
>
> Yes.  A frozen array has a store check which is particularly 
> draconian:  It always
> throws.  (We'll have a separate email chain for that one.)
>
> One more point:  In order to fulfill the promise of "works like an 
> int", we must
> drive towards a future where even the primitives are ret-conned as 
> value types.
> This is another reason to appeal to primitive array types like "int[]" 
> when
> reasoning about the required behavior of value-type arrays like 
> "DoubleComplex[]".
> Eventually "int[]" will be a subtype of "Object[]", and the properties 
> of flatness
> and non-nullability of "int[]" and "DoubleComplex[]" should be 
> virtually identical,
> at that point.  Let's aim for it.
>
> — John

This last point, this is it: primitives as value types...this is 
actually what convinces me, that yes flat-arrays all the way, Object[] 
has your back for null-ability for compatibility.

Thanks for the feed-back, and calling out errors. Will be interesting to 
see what others think.

Again, it might be useful for me to write up a few examples of code 
patterns with differing behavior.

/Mr. Simms