RFR: 8292602: ZGC: C2 late barrier analysis uses invalid dominator information
Tobias Hartmann
thartmann at openjdk.org
Thu Sep 22 08:27:26 UTC 2022
On Tue, 20 Sep 2022 07:04:34 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:
> Late ZGC barrier analysis (`ZBarrierSetC2::analyze_dominating_barriers()`) uses dominator information to elide unnecessary load barriers. This information is invalidated whenever the block ordering phase (`PhaseCFG::fixup_flow()`) inserts a new block (`PhaseCFG::insert_goto_at()`).
>
> This changeset updates the dominator information phase after each goto-block insertion in `insert_goto_at()`, to ensure that late ZGC barrier analysis works correctly. Domination information is encoded in two `Block` member variables: `_idom` and `_dom_depth`. The immediate dominator (`_idom`) of the new goto-block (`block`) and its successor (`out`) are trivially remapped after the insertion of `block`, whereas the dominator tree depth (`_dom_depth`) of the `out` subtree is updated by a depth-first search of the descendants of `out`.
>
> Additionally, the changeset adds dominator tree checks (`PhaseCFG::verify_dominator_tree()`) every time `PhaseCFG::verify()` is called and also at the end of the `blockOrdering` phase (where `insert_goto_at()` might be called); and includes dominator information in IGV graphs for ease of debugging.
>
> #### Alternative Solutions
>
> An alternative solution would be to rebuild the dominator tree before using dominator information in ZGC's late barrier analysis. The alternative [has been explored](https://github.com/robcasloz/jdk/tree/JDK-8292602-recompute) and found to be more complex than this changeset, as `PhaseCFG::build_dominator_tree()` makes multiple assumptions on the shape of the Ideal graph that do not hold at a later C2 phase.
>
> Another alternative would be to update the dominator tree only if ZGC, the only late consumer of domination information, is enabled. This changeset proposes updating the dominator tree unconditionally for simplicity, uniformity, and better test coverage. The overhead of doing so is insignificant for the main standard benchmark suites (DaCapo, SPECjbb2005, SPECjvm2008, SPECjbb2015, ...). A closer study of the DaCapo benchmarks shows that, on average, `insert_goto_at()` is called less than once per C2 compilation, and each call traverses less than 1% of the total blocks in the CFG.
>
> #### Testing
>
> - tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode).
> - tier4-7 (linux-x64; debug mode).
> - fuzzing (~1 h. on each platform).
> - verified the dominator trees and node depths of 615 graphs updated by `insert_goto_at()` by comparing them with the result of an [independent dominator construction implementation](https://github.com/robcasloz/jdk/blob/ideal-check/src/utils/IdealTools/ideal-check.py) (based on Python's NetworkX package).
>
> Thanks to Nils Eliasson for finding the issue and providing an initial fix.
Looks good to me.
-------------
Marked as reviewed by thartmann (Reviewer).
PR: https://git.openjdk.org/jdk/pull/10353
More information about the hotspot-gc-dev
mailing list