RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences()
Emanuel Peter
epeter at openjdk.org
Thu Jun 20 13:19:11 UTC 2024
On Wed, 19 Jun 2024 12:54:26 GMT, Roland Westrelin <roland at openjdk.org> wrote:
> In a debug build, `PhaseCFG::insert_anti_dependences()` is called
> twice for a single node: once for actual processing, once for
> verification.
>
> In TestAntiDependenciesHighMemUsage, the test has a `Region` that
> merges 337 incoming path. It also has one `Phi` per memory slice that
> are stored to: 1000 `Phi` nodes. Each `Phi` node has 337 inputs that
> are identical except for one. The common input is the memory state on
> method entry. The test has 60 `Load` that needs to be processed for
> anti dependences. All `Load` share the same memory input: the memory
> state on method entry. For each `Load`, all `Phi` nodes are pushed 336
> times on the work lists for anti dependence processing because all of
> them appear multiple times as uses of each `Load`s memory state: `Phi`s
> are pushed 336 000 on 2 work lists. Memory is not reclaimed on exit
> from `PhaseCFG::insert_anti_dependences()` so memory usage grows as
> `Load` nodes are processed:
>
> 336000 * 2 work lists * 60 loads * 8 bytes pointer = 322 MB.
>
> The fix I propose for this is to not push `Phi` nodes more than once
> when they have the same inputs multiple times.
>
> In TestAntiDependenciesHighMemUsage2, the test has 4000 loads. For
> each of them, when processed for anti dependences, all 4000 loads are
> pushed on the work lists because they share the same memory
> input. Then when they are popped from the work list, they are
> discarded because only stores are of interest:
>
> 4000 loads processed * 4000 loads pushed * 2 work lists * 8 bytes pointer = 256 MB.
>
> The fix I propose for this is to test before pushing on the work list
> whether a node is a store or not.
>
> Finally, I propose adding a `ResourceMark` so memory doesn't
> accumulate over calls to `PhaseCFG::insert_anti_dependences()`.
I'm making some notes about what the algorithm does:
We want to find all `Store`s that need to come after a `Load`. For this, we take the `Load`s memory input, and traverse down all uses, looking through `MergeMem` and `Phi` nodes. We do an edge traversal, we always have a `use` and a `def` node (sadly they are non-sensically calles `use = store` and `def = mem`. I guess we are speculating that the `use` could be a `store`, but it may just be some `MergeMem` that we are looking through, or any other node we are not interested in.
In this traversal, we make sure not to traverse a `MergeMem` multiple times - even if we get to it through multiple slices (the inputs of the `MergeMem`. However, for `Phi` nodes, we want to visit the `def - phi` pair multiple times, if the `def` (aka `mem`) node is different.
It would be nice if `worklist_visited` was called `visited_merge_mem_set`, and be a `VectorSet`...
Anyway, later we process these edges, but I'll look at that tomorrow.
src/hotspot/share/opto/gcm.cpp line 673:
> 671: Node* store = worklist_store.pop();
> 672: uint op = store->Opcode();
> 673: assert(!store->needs_anti_dependence_check(), "only stores");
I find the "only stores" comment a bit confusing. Because `store` can also be a `MergeMem`, right? Any maybe other things. Honestly, the name `store` seems generally confusing to me, since it is not at all clear that this is really a `StoreNode`.
src/hotspot/share/opto/gcm.cpp line 689:
> 687:
> 688: if (op == Op_MachProj || op == Op_Catch) continue;
> 689: if (store->needs_anti_dependence_check()) continue; // not really a store
Why did you remove this?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19791#issuecomment-2180662125
PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1647536988
PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1647561469
More information about the hotspot-compiler-dev
mailing list