RFR: 8333258: C2: high memory usage in PhaseCFG::insert_anti_dependences() [v5]

Emanuel Peter epeter at openjdk.org
Thu Sep 12 08:41:12 UTC 2024


On Mon, 26 Aug 2024 13:34:41 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> In a debug build, `PhaseCFG::insert_anti_dependences()` is called
>> twice for a single node: once for actual processing, once for
>> verification.
>> 
>> In TestAntiDependenciesHighMemUsage, the test has a `Region` that
>> merges 337 incoming path. It also has one `Phi` per memory slice that
>> are stored to: 1000 `Phi` nodes. Each `Phi` node has 337 inputs that
>> are identical except for one. The common input is the memory state on
>> method entry. The test has 60 `Load` that needs to be processed for
>> anti dependences. All `Load` share the same memory input: the memory
>> state on method entry. For each `Load`, all `Phi` nodes are pushed 336
>> times on the work lists for anti dependence processing because all of
>> them appear multiple times as uses of each `Load`s memory state: `Phi`s
>> are pushed 336 000 on 2 work lists. Memory is not reclaimed on exit
>> from `PhaseCFG::insert_anti_dependences()` so memory usage grows as
>> `Load` nodes are processed:
>> 
>> 336000 * 2 work lists * 60 loads * 8 bytes pointer = 322 MB. 
>> 
>> The fix I propose for this is to not push `Phi` nodes more than once
>> when they have the same inputs multiple times.
>> 
>> In TestAntiDependenciesHighMemUsage2, the test has 4000 loads. For
>> each of them, when processed for anti dependences, all 4000 loads are
>> pushed on the work lists because they share the same memory
>> input. Then when they are popped from the work list, they are
>> discarded because only stores are of interest:
>> 
>> 4000 loads processed * 4000 loads pushed * 2 work lists * 8 bytes pointer = 256 MB. 
>> 
>> The fix I propose for this is to test before pushing on the work list
>> whether a node is a store or not.
>> 
>> Finally, I propose adding a `ResourceMark` so memory doesn't
>> accumulate over calls to `PhaseCFG::insert_anti_dependences()`.
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   more review

@rwestrel sorry for the long delay, I was away for over 3 weeks. Looks much better now, I still have a few comments and suggestions.

src/hotspot/share/opto/gcm.cpp line 632:

> 630:     if (use_mem_state->is_MergeMem()) {
> 631:       // Be sure we don't get into combinatorial problems.
> 632:       // (Allow phis to be repeated; they can merge two relevant states.)

Does this comment not belong to the `Phi` code below?

src/hotspot/share/opto/gcm.cpp line 637:

> 635:         if (_worklist_visited.at(j-1) == use_mem_state)  return; // already on work list; do not repeat
> 636:       }
> 637:       _worklist_visited.push(use_mem_state);

This is fine, but it might be nicer if we had such a `push_if_missing`. `GrowableArray` has that functionality. Maybe you should just use a `GrowableArray` then anyway, right? It would be less code.

src/hotspot/share/opto/gcm.cpp line 749:

> 747:   //    initial_mem -> (MergeMem ->)* Memory state modifying node
> 748:   // Memory state modifying nodes include Store and Phi nodes and any node for which needs_anti_dependence_check()
> 749:   // returns true.

Basically this is the idea:
`needs_anti_dependence_check() == true` for Loads
`needs_anti_dependence_check() == false` for MergeMem, Phi, Store

So your `returns true` is not correct, right? Should it not be `returns false`?

src/hotspot/share/opto/gcm.cpp line 761:

> 759: 
> 760:     uint op = use_mem_state->Opcode();
> 761:     assert(!use_mem_state->needs_anti_dependence_check(), "only stores");

The comment in the assert is misleading. It can also be MergeMem, Phi, etc

src/hotspot/share/opto/gcm.cpp line 775:

> 773:       for (DUIterator_Fast imax, i = def_mem_state->fast_outs(imax); i < imax; i++) {
> 774:         use_mem_state = def_mem_state->fast_out(i);
> 775:         // If this is not a store, load can't be anti dependent on this node

This comment is not helping me, and I think it is also not precise...

Instead of `store`, you should be talking about `MergeMem`, `Phi`, `Store` and any other state modifying node.

Suggestion: remove comment, and after the check, just before the `continue`, add this comment:

use_mem_state is also a kind of load (i.e. needs_anti_dependence_check), and it is not any state modifying node, store, Phi or MergeMem. Hence, load is not anti dependent on this node.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19791#pullrequestreview-2299488433
PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756325982
PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756397476
PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756355053
PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756352150
PR Review Comment: https://git.openjdk.org/jdk/pull/19791#discussion_r1756385142


More information about the hotspot-compiler-dev mailing list