RFR(XL): 8224675: Late GC barrier insertion for ZGC

Nils Eliasson nils.eliasson at oracle.com
Thu May 23 14:25:43 UTC 2019


Hi,

In ZGC we use load barriers on references. In the original 
implementation these where added as macro nodes at parse time. The load 
barrier node consumes and produces control flow in order to be able to 
be lowered into a check with a slow path late. The load barrier nodes 
are fixed in the control flow, and extensions to different optimizations 
are need the barriers out of loop and past other unrelated control flow.

With this patch the barriers are instead added after the loop 
optimizations, before macro node expansion. This makes the entire 
pipeline until that point oblivious about the barriers. A dump of the IR 
with ZGC or EpsilonGC will be basically identical at that point, and the 
diff compared to serialGC or ParallelGC that use write barriers is 
really small.

Benefits

- A major complexity reduction. One can reason about and implement loop 
optimization without caring about the barriers. The escape analysis 
doesn't need to know about the barriers. Loads float freely like they 
are supposed to.

- Less nodes early. The inlining will become more deterministic. A 
barrier heavy GC will not run into node limits earlier. Also node limit 
bounded optimization like unrolling and peeling will not be penalized by 
barriers.

- Better test coverage, or reduce testing cost when the same 
optimization doesn't need to be verified with every GC.

- Better control on where barriers end up. It is trivial to guarantee 
that the load and barriers are not separated by a safepoint.

Design

The implementation uses an extra phase that piggy back on PhaseIdealLoop 
which provides control and dominator information for all loads. This 
extra phase is needed because we need to splice the control flow when 
adding the load barriers.

Barriers are inserted on the loads nodes in post order (any successor 
first). This is to guarantee the dominator information above every 
insertion is correct. This is also important within blocks. Two loads in 
the same block can float in relation to each other. The addition of 
barriers serializes their order. Any def-use relationship is upheld by 
expanding them post order.

Barrier insertion is done in stages. In this first stage a single macro 
node that represents the barrier is added with all dependencies that is 
required. In the macro expansion phase the barrier nodes is expanded 
into the final shape, adding nodes that represent the conditional load 
barrier check. (Write barriers in other GCs could possibly be expanded 
here directly)

All the barriers that are needed for unsafe reference operations (cas, 
swap, cmpx) are also expanded late. They already have control flow, so 
the expansion is straight forward.

The barriers for the unsafe reference operations (cas, getandset, cmpx) 
have also been simplified. The cas-load-cas dance have been replaced by 
a pre-load. The pre-load is a load with a barrier, that is kept alive by 
an extra (required) edge on the unsafe-primitive-nodes (specialized as 
ZCompareAndSwap, ZGetAndSet, ZCompareAndExchange).

One challenge that was encountered early and that have caused 
considerable work is that nodes (like loads) can end up between calls 
and their catch projections. This is usually handled after matching, in 
PhaseCFG::call_catch_cleanup, where the nodes after the call are cloned 
to all catch blocks. At this stage they are in an ordered list, so that 
is a straight forward process. For late barrier insertion we need to 
splice in control earlier, before matching, and control flow between 
calls and catches is not allowed. This requires us to add a 
transformation pass where all loads and their dependent instructions are 
cloned out to the catch blocks before we can start splicing in control 
flow. This transformation doesn't replace the legacy call_catch_cleanup 
fully, but it could be a future goal.

In the original barrier implementation there where two different load 
barrier implementations: the basic and the optimized. With the new 
approach to barriers on unsafe, the basic is no longer required and has 
been removed. (It provided options for skipping the self healing, and 
passed the ref in a register, guaranteeing that the oop wasn't reloaded.)

The wart that was fixup_partial_loads in zHeap has also been made 
redundant.

Dominating barriers are no longer removed on weak loads. Weak barriers 
doesn't guarantee self-healing.

Follow up work:

- Consolidate all uses of GrowableArray::insert_sorted to use the new 
version

- Refactor the phases. There are a lot of simplifications and 
verification that can be done with more well defined phases.

- Simplify the remaining barrier optimizations. There might still be 
code paths that are no longer needed.


Testing:

Hotspot tier 1-6, CTW, jcstress, micros, runthese, kitchensink, and then 
some. All with -XX:+ZVerifyViews.

Bug: https://bugs.openjdk.java.net/browse/JDK-8224675

Webrev: http://cr.openjdk.java.net/~neliasso/8224675/webrev.01/


Please review,

Regards,

Nils



More information about the hotspot-compiler-dev mailing list