RFR(XL): 8224675: Late GC barrier insertion for ZGC

Mon May 27 11:04:18 UTC 2019

Thanks!

I hope it can be an evolution that can be useful to all GCs in the long run.

Serial and parallel should be fairly straight forward, G1 requires a 
little bit more care, and I guess Shenandoah too.

// Nils

On 2019-05-24 11:42, Roman Kennke wrote:
> Hi Nils,
>
>> I removed the method. I have verified that Shenandoah builds and isn't
>> obviously broken.
>>
>> Webrev updated in place.
> Thanks! Will review more thoroughly and run some tests later.
>
> Nice approach, btw!
>
> Roman
>
>
>
>> Regards,
>>
>> Nils
>>
>> On 2019-05-23 16:31, Roman Kennke wrote:
>>> Quick glance showed a problem: you are renaming/moving
>>> BarrierSetC2::add_users_to_worklist() but ShenandoahBarrierSetC2 is not
>>> updated accordingly.
>>>
>>> Roman
>>>
>>>
>>>> Hi,
>>>>
>>>> In ZGC we use load barriers on references. In the original
>>>> implementation these where added as macro nodes at parse time. The load
>>>> barrier node consumes and produces control flow in order to be able to
>>>> be lowered into a check with a slow path late. The load barrier nodes
>>>> are fixed in the control flow, and extensions to different optimizations
>>>> are need the barriers out of loop and past other unrelated control flow.
>>>>
>>>> With this patch the barriers are instead added after the loop
>>>> optimizations, before macro node expansion. This makes the entire
>>>> pipeline until that point oblivious about the barriers. A dump of the IR
>>>> with ZGC or EpsilonGC will be basically identical at that point, and the
>>>> diff compared to serialGC or ParallelGC that use write barriers is
>>>> really small.
>>>>
>>>> Benefits
>>>>
>>>> - A major complexity reduction. One can reason about and implement loop
>>>> optimization without caring about the barriers. The escape analysis
>>>> doesn't need to know about the barriers. Loads float freely like they
>>>> are supposed to.
>>>>
>>>> - Less nodes early. The inlining will become more deterministic. A
>>>> barrier heavy GC will not run into node limits earlier. Also node limit
>>>> bounded optimization like unrolling and peeling will not be penalized by
>>>> barriers.
>>>>
>>>> - Better test coverage, or reduce testing cost when the same
>>>> optimization doesn't need to be verified with every GC.
>>>>
>>>> - Better control on where barriers end up. It is trivial to guarantee
>>>> that the load and barriers are not separated by a safepoint.
>>>>
>>>> Design
>>>>
>>>> The implementation uses an extra phase that piggy back on PhaseIdealLoop
>>>> which provides control and dominator information for all loads. This
>>>> extra phase is needed because we need to splice the control flow when
>>>> adding the load barriers.
>>>>
>>>> Barriers are inserted on the loads nodes in post order (any successor
>>>> first). This is to guarantee the dominator information above every
>>>> insertion is correct. This is also important within blocks. Two loads in
>>>> the same block can float in relation to each other. The addition of
>>>> barriers serializes their order. Any def-use relationship is upheld by
>>>> expanding them post order.
>>>>
>>>> Barrier insertion is done in stages. In this first stage a single macro
>>>> node that represents the barrier is added with all dependencies that is
>>>> required. In the macro expansion phase the barrier nodes is expanded
>>>> into the final shape, adding nodes that represent the conditional load
>>>> barrier check. (Write barriers in other GCs could possibly be expanded
>>>> here directly)
>>>>
>>>> All the barriers that are needed for unsafe reference operations (cas,
>>>> swap, cmpx) are also expanded late. They already have control flow, so
>>>> the expansion is straight forward.
>>>>
>>>> The barriers for the unsafe reference operations (cas, getandset, cmpx)
>>>> have also been simplified. The cas-load-cas dance have been replaced by
>>>> a pre-load. The pre-load is a load with a barrier, that is kept alive by
>>>> an extra (required) edge on the unsafe-primitive-nodes (specialized as
>>>> ZCompareAndSwap, ZGetAndSet, ZCompareAndExchange).
>>>>
>>>> One challenge that was encountered early and that have caused
>>>> considerable work is that nodes (like loads) can end up between calls
>>>> and their catch projections. This is usually handled after matching, in
>>>> PhaseCFG::call_catch_cleanup, where the nodes after the call are cloned
>>>> to all catch blocks. At this stage they are in an ordered list, so that
>>>> is a straight forward process. For late barrier insertion we need to
>>>> splice in control earlier, before matching, and control flow between
>>>> calls and catches is not allowed. This requires us to add a
>>>> transformation pass where all loads and their dependent instructions are
>>>> cloned out to the catch blocks before we can start splicing in control
>>>> flow. This transformation doesn't replace the legacy call_catch_cleanup
>>>> fully, but it could be a future goal.
>>>>
>>>> In the original barrier implementation there where two different load
>>>> barrier implementations: the basic and the optimized. With the new
>>>> approach to barriers on unsafe, the basic is no longer required and has
>>>> been removed. (It provided options for skipping the self healing, and
>>>> passed the ref in a register, guaranteeing that the oop wasn't
>>>> reloaded.)
>>>>
>>>> The wart that was fixup_partial_loads in zHeap has also been made
>>>> redundant.
>>>>
>>>> Dominating barriers are no longer removed on weak loads. Weak barriers
>>>> doesn't guarantee self-healing.
>>>>
>>>> Follow up work:
>>>>
>>>> - Consolidate all uses of GrowableArray::insert_sorted to use the new
>>>> version
>>>>
>>>> - Refactor the phases. There are a lot of simplifications and
>>>> verification that can be done with more well defined phases.
>>>>
>>>> - Simplify the remaining barrier optimizations. There might still be
>>>> code paths that are no longer needed.
>>>>
>>>>
>>>> Testing:
>>>>
>>>> Hotspot tier 1-6, CTW, jcstress, micros, runthese, kitchensink, and then
>>>> some. All with -XX:+ZVerifyViews.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8224675
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8224675/webrev.01/
>>>>
>>>>
>>>> Please review,
>>>>
>>>> Regards,
>>>>
>>>> Nils
>>>>