On vector box elimination
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Thu Dec 7 17:35:15 UTC 2017
Hi,
I'd like to share some thoughts about vector box elimination in C2.
On bytecode level all vectors & vector operations are expressed in terms
of vector boxes (L-types). In order to produce efficient machine code C2
has to scalarize them away. That's why predictable vector box
elimination is crucial for Vector API (at least, in its current form).
In Panama there are different attempts to address the problem: in
context of Vector API (using VectorBox on typed vectors) and Machine
Code Snippets (using VBox on super-longs).
Both attempts share similar approach (a dedicated node to represent
vector box), but differ in details. For example, there are 2 objects
associated with VectorBox (typed vector box and primitive array as
vector storage) whereas for VBox there's only 1 box (super-long box itself).
I consider those differences as minor implementation details and try to
keep things simple (1 box per vector value, only 2 inputs (box & value)
and 1 output (box) in VectorBox node) while focusing on inherent
problems of box elimination.
There was an experiment [1] to enhance escape analysis (EA) in C2 and
relax constraints for vectors (in particular and value-based classes in
general), but it didn't work well. The prototype was unstable, required
multiple passes, and provided modest results w.r.t box elimination for
value-based classes.
Alternative approach is to keep the logic related to value-based classes
separate from EA and confine transformations which don't preserve object
identity on special nodes (like VectorBox & VBox) while piggy-backing on
EA to eliminate unused allocations. That's the direction both VectorBox
& VBox lean to, but not completely.
(I'll depict vector box node as VectorBox in the examples though current
implementation in Panama differs in some aspect.)
VectorBox is created to wrap a vector value when boxing happen.
value =BOX=> VectorBox new_box value
Right now, it implicitly happens when either vector operation is
intrinsified or a vector is constructed by loading its content from
primitive array.
But it'd be useful (e.g., for vector constants) to wrap on-heap vector
boxes in VectorBox as well:
# box = Load o.v
Load o.v => VectorBox box (LoadVector box)
The core transformation is box/unbox elimination:
LoadVector (VectorBox box value) =UNBOX=> value
It allows to bypass loading vectors from temporary boxes and directly
connect dependent operations, effectively eliminating most usages of
vector boxes.
Right before EA, VectorBox nodes are eliminated and EA tries to get rid
of temporarily allocated boxes:
VectorBox box value =ELIM=> box
Though vectors are passed directly between vector operations, usually,
there are some users of VectorBox nodes left due to:
(1) safepoints
(2) control flow
Consider the case when vectors leak into safepoints as debug info:
SafePoint ... (VectorBox box value) ...
Replacing VectorBox with value doesn't work because SafePoint expects an
instance and not a vector value:
SafePoint (VectorBox box value) =/=> SafePoint value
SafePoint (VectorBox box value) => SafePoint box
But box usage can be still eliminated by passing value, but doing
rematerialization when needed:
SafePoint (VectorBox box value)
=>
SafePoint (SafePointScalarObject box_klass #value) ... value
EA automatically does that, but only for objects it considers as
scalarizable. For vectors it's better not to depend on EA here and
explicitly perform it irrespective of EA results.
Regarding control flow, merging VectorBoxes through phis helps
significatly simplify IR:
Phi (VectorBox o1 v1) (VectorBox o2 v2)
=MERGE=>
VectorBox (Phi o1 o2) (Phi v1 v2)
It nicely combines with other transformations:
LoadVector (Phi (VectorBox o1 v1) (VectorBox o2 v2))
=MERGE=>
LoadVector (VectorBox (Phi o1 o2) (Phi v1 v2))
=UNBOX=>
Phi v1 v2
But it doesn't help with consequent box elimination if the box leaks or
there's a cycle in the graph:
VectorBox (Phi o1 o2) (Phi v1 v2) =ELIM=> Phi o1 o2
EA implementation in C2 doesn't cover control flow, so *both* o1 & o2
will be conservatively marked as non-scalarizable.
Additional trick to work-around that is to do aggressive reboxing:
VectorBox (Phi o1 o2) (Phi v1 v2)
=REBOX=>
VectorBox new_box (Phi v1 v2)
=ELIM=>
new_box // initialized with (Phi v1 v2))
If (Phi o1 o2) hasn't leaked anywhere (and it shouldn't since it has
been hidden behind VectorBox all the time), then it simply goes away and
doesn't hinder EA anymore.
The only problem with reboxing: full JVM state is required for new
allocation (involves a safepoint). So, either VectorBox has to preserve
JVM state (e.g., by extending SafePoint) or there's a way to compute
nearest JVM state for VectorBox to be used for fresh box allocation.
But aggressive reboxing isn't limited to Phi elimination. It also allows
to move allocations to the places where instances leak (e.g.,
non-inlined calls). Usually, those are on rarely taken paths, but
sometimes they can end on hot path and aggressive reboxing can
significantly increase object allocation rate.
CONCLUSIONS
Looking at where VectorBox is now, implementing rematerialization,
merge-through-phis, and aggressive reboxing should be enough to solve
boxing issues in most of the interesting cases.
PS: on interactions with value types & Valhalla.
Most of described tricks (and some more) are already covered for value
types in Valhalla (see ValueTypeNode), so there are nice opportunities
for synergies when merging with Valhalla/MVT.
But it's not trivial to replace VectorBox with ValueTypeNode, even if
representing vectors as value types. Some time ago I wrote my thoughts
about MVT-based vectors [2] and conclusion was vectors don't perfectly
fit current implementation of value types in Valhalla. Mostly because it
is targeted for aggregates of primitive values and JVM lacks full
support of primitives larger than 64 bits in size.
Another complication is ValueTypeNode works only on Q-types and for
vectors L-types are used (since they are accessed either through Vector
interface or shape-agnostic super-class). But considering recent
progress on U-types [3], it looks like ValueTypeNode will be enhanced
with L-type support as well.
Best regards,
Vladimir Ivanov
[1] http://cr.openjdk.java.net/~vlivanov/panama/hbox/webrev.02/hotspot/
[2]
http://mail.openjdk.java.net/pipermail/valhalla-dev/2017-November/003555.html
[3]
http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2017-November/000432.html
More information about the panama-dev
mailing list