RFR(XL): 8224675: Late GC barrier insertion for ZGC
Nils Eliasson
nils.eliasson at oracle.com
Thu May 23 14:25:43 UTC 2019
Hi,
In ZGC we use load barriers on references. In the original
implementation these where added as macro nodes at parse time. The load
barrier node consumes and produces control flow in order to be able to
be lowered into a check with a slow path late. The load barrier nodes
are fixed in the control flow, and extensions to different optimizations
are need the barriers out of loop and past other unrelated control flow.
With this patch the barriers are instead added after the loop
optimizations, before macro node expansion. This makes the entire
pipeline until that point oblivious about the barriers. A dump of the IR
with ZGC or EpsilonGC will be basically identical at that point, and the
diff compared to serialGC or ParallelGC that use write barriers is
really small.
Benefits
- A major complexity reduction. One can reason about and implement loop
optimization without caring about the barriers. The escape analysis
doesn't need to know about the barriers. Loads float freely like they
are supposed to.
- Less nodes early. The inlining will become more deterministic. A
barrier heavy GC will not run into node limits earlier. Also node limit
bounded optimization like unrolling and peeling will not be penalized by
barriers.
- Better test coverage, or reduce testing cost when the same
optimization doesn't need to be verified with every GC.
- Better control on where barriers end up. It is trivial to guarantee
that the load and barriers are not separated by a safepoint.
Design
The implementation uses an extra phase that piggy back on PhaseIdealLoop
which provides control and dominator information for all loads. This
extra phase is needed because we need to splice the control flow when
adding the load barriers.
Barriers are inserted on the loads nodes in post order (any successor
first). This is to guarantee the dominator information above every
insertion is correct. This is also important within blocks. Two loads in
the same block can float in relation to each other. The addition of
barriers serializes their order. Any def-use relationship is upheld by
expanding them post order.
Barrier insertion is done in stages. In this first stage a single macro
node that represents the barrier is added with all dependencies that is
required. In the macro expansion phase the barrier nodes is expanded
into the final shape, adding nodes that represent the conditional load
barrier check. (Write barriers in other GCs could possibly be expanded
here directly)
All the barriers that are needed for unsafe reference operations (cas,
swap, cmpx) are also expanded late. They already have control flow, so
the expansion is straight forward.
The barriers for the unsafe reference operations (cas, getandset, cmpx)
have also been simplified. The cas-load-cas dance have been replaced by
a pre-load. The pre-load is a load with a barrier, that is kept alive by
an extra (required) edge on the unsafe-primitive-nodes (specialized as
ZCompareAndSwap, ZGetAndSet, ZCompareAndExchange).
One challenge that was encountered early and that have caused
considerable work is that nodes (like loads) can end up between calls
and their catch projections. This is usually handled after matching, in
PhaseCFG::call_catch_cleanup, where the nodes after the call are cloned
to all catch blocks. At this stage they are in an ordered list, so that
is a straight forward process. For late barrier insertion we need to
splice in control earlier, before matching, and control flow between
calls and catches is not allowed. This requires us to add a
transformation pass where all loads and their dependent instructions are
cloned out to the catch blocks before we can start splicing in control
flow. This transformation doesn't replace the legacy call_catch_cleanup
fully, but it could be a future goal.
In the original barrier implementation there where two different load
barrier implementations: the basic and the optimized. With the new
approach to barriers on unsafe, the basic is no longer required and has
been removed. (It provided options for skipping the self healing, and
passed the ref in a register, guaranteeing that the oop wasn't reloaded.)
The wart that was fixup_partial_loads in zHeap has also been made
redundant.
Dominating barriers are no longer removed on weak loads. Weak barriers
doesn't guarantee self-healing.
Follow up work:
- Consolidate all uses of GrowableArray::insert_sorted to use the new
version
- Refactor the phases. There are a lot of simplifications and
verification that can be done with more well defined phases.
- Simplify the remaining barrier optimizations. There might still be
code paths that are no longer needed.
Testing:
Hotspot tier 1-6, CTW, jcstress, micros, runthese, kitchensink, and then
some. All with -XX:+ZVerifyViews.
Bug: https://bugs.openjdk.java.net/browse/JDK-8224675
Webrev: http://cr.openjdk.java.net/~neliasso/8224675/webrev.01/
Please review,
Regards,
Nils
More information about the hotspot-compiler-dev
mailing list