On vector box elimination

Thu Dec 7 17:35:15 UTC 2017

Hi,

I'd like to share some thoughts about vector box elimination in C2.

On bytecode level all vectors & vector operations are expressed in terms 
of vector boxes (L-types). In order to produce efficient machine code C2 
has to scalarize them away. That's why predictable vector box 
elimination is crucial for Vector API (at least, in its current form).

In Panama there are different attempts to address the problem: in 
context of Vector API (using VectorBox on typed vectors) and Machine 
Code Snippets (using VBox on super-longs).

Both attempts share similar approach (a dedicated node to represent 
vector box), but differ in details. For example, there are 2 objects 
associated with VectorBox (typed vector box and primitive array as 
vector storage) whereas for VBox there's only 1 box (super-long box itself).

I consider those differences as minor implementation details and try to 
keep things simple (1 box per vector value, only 2 inputs (box & value) 
and 1 output (box) in VectorBox node) while focusing on inherent 
problems of box elimination.

There was an experiment [1] to enhance escape analysis (EA) in C2 and 
relax constraints for vectors (in particular and value-based classes in 
general), but it didn't work well. The prototype was unstable, required 
multiple passes, and provided modest results w.r.t box elimination for 
value-based classes.

Alternative approach is to keep the logic related to value-based classes 
separate from EA and confine transformations which don't preserve object 
identity on special nodes (like VectorBox & VBox) while piggy-backing on 
EA to eliminate unused allocations. That's the direction both VectorBox 
& VBox lean to, but not completely.

(I'll depict vector box node as VectorBox in the examples though current 
implementation in Panama differs in some aspect.)

VectorBox is created to wrap a vector value when boxing happen.

   value =BOX=> VectorBox new_box value

Right now, it implicitly happens when either vector operation is 
intrinsified or a vector is constructed by loading its content from 
primitive array.

But it'd be useful (e.g., for vector constants) to wrap on-heap vector 
boxes in VectorBox as well:

   # box = Load o.v
   Load o.v  =>  VectorBox box (LoadVector box)

The core transformation is box/unbox elimination:

   LoadVector (VectorBox box value)  =UNBOX=>  value

It allows to bypass loading vectors from temporary boxes and directly 
connect dependent operations, effectively eliminating most usages of 
vector boxes.

Right before EA, VectorBox nodes are eliminated and EA tries to get rid 
of temporarily allocated boxes:

    VectorBox box value  =ELIM=>  box

Though vectors are passed directly between vector operations, usually, 
there are some users of VectorBox nodes left due to:
   (1) safepoints
   (2) control flow

Consider the case when vectors leak into safepoints as debug info:

   SafePoint ... (VectorBox box value) ...

Replacing VectorBox with value doesn't work because SafePoint expects an 
instance and not a vector value:

   SafePoint (VectorBox box value)  =/=>  SafePoint value

   SafePoint (VectorBox box value)  =>    SafePoint box

But box usage can be still eliminated by passing value, but doing 
rematerialization when needed:

   SafePoint (VectorBox box value)
     =>
   SafePoint (SafePointScalarObject box_klass #value) ... value

EA automatically does that, but only for objects it considers as 
scalarizable. For vectors it's better not to depend on EA here and 
explicitly perform it irrespective of EA results.

Regarding control flow, merging VectorBoxes through phis helps 
significatly simplify IR:

   Phi (VectorBox o1 v1) (VectorBox o2 v2)
     =MERGE=>
   VectorBox (Phi o1 o2) (Phi v1 v2)

It nicely combines with other transformations:

   LoadVector (Phi (VectorBox o1 v1) (VectorBox o2 v2))
     =MERGE=>
   LoadVector (VectorBox (Phi o1 o2) (Phi v1 v2))
     =UNBOX=>
   Phi v1 v2

But it doesn't help with consequent box elimination if the box leaks or 
there's a cycle in the graph:

   VectorBox (Phi o1 o2) (Phi v1 v2)  =ELIM=>  Phi o1 o2

EA implementation in C2 doesn't cover control flow, so *both* o1 & o2 
will be conservatively marked as non-scalarizable.

Additional trick to work-around that is to do aggressive reboxing:

   VectorBox (Phi o1 o2) (Phi v1 v2)
     =REBOX=>
   VectorBox new_box (Phi v1 v2)
     =ELIM=>
   new_box // initialized with (Phi v1 v2))

If (Phi o1 o2) hasn't leaked anywhere (and it shouldn't since it has 
been hidden behind VectorBox all the time), then it simply goes away and 
doesn't hinder EA anymore.

The only problem with reboxing: full JVM state is required for new 
allocation (involves a safepoint). So, either VectorBox has to preserve 
JVM state (e.g., by extending SafePoint) or there's a way to compute 
nearest JVM state for VectorBox to be used for fresh box allocation.

But aggressive reboxing isn't limited to Phi elimination. It also allows 
to move allocations to the places where instances leak (e.g., 
non-inlined calls). Usually, those are on rarely taken paths, but 
sometimes they can end on hot path and aggressive reboxing can 
significantly increase object allocation rate.

CONCLUSIONS

Looking at where VectorBox is now, implementing rematerialization, 
merge-through-phis, and aggressive reboxing should be enough to solve 
boxing issues in most of the interesting cases.

PS: on interactions with value types & Valhalla.

Most of described tricks (and some more) are already covered for value 
types in Valhalla (see ValueTypeNode), so there are nice opportunities 
for synergies when merging with Valhalla/MVT.

But it's not trivial to replace VectorBox with ValueTypeNode, even if 
representing vectors as value types. Some time ago I wrote my thoughts 
about MVT-based vectors [2] and conclusion was vectors don't perfectly 
fit current implementation of value types in Valhalla. Mostly because it 
is targeted for aggregates of primitive values and JVM lacks full 
support of primitives larger than 64 bits in size.

Another complication is ValueTypeNode works only on Q-types and for 
vectors L-types are used (since they are accessed either through Vector 
interface or shape-agnostic super-class). But considering recent 
progress on U-types [3], it looks like ValueTypeNode will be enhanced 
with L-type support as well.

Best regards,
Vladimir Ivanov

[1] http://cr.openjdk.java.net/~vlivanov/panama/hbox/webrev.02/hotspot/
[2] 
http://mail.openjdk.java.net/pipermail/valhalla-dev/2017-November/003555.html
[3] 
http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2017-November/000432.html