RFR: 8333697: C2: Hit MemLimit in PhaseCFG::global_code_motion
Roland Westrelin
roland at openjdk.org
Mon Jan 13 14:53:56 UTC 2025
I investigated the failure from the `Test.java` that's attached to the
bug. The failure with this test is only reproducible up to 8334060
(Implementation of Late Barrier Expansion for G1) so experiments I
describe here are from the source code for the commit right before it.
Peak malloc memory usage reported by NMT is: 1.3GB
`PhaseCFG::global_code_motion()`, when `OptoRegScheduling` is true,
creates a `PhaseIFG` that's, when initialized, allocates `_adjs`: a
`maxlrg` array of `IndexSet`s that can contain up to `maxlrg`.
`maxlrg` in this case is 122839. An `IndexSet` is an array of pointers
to a 256 bit bitset: one `IndexSet` array needs:
122839 / 256 * 8 = 3832
and there are of 122839:
3832 * 122839 = ~470 MB
It turns out the `PhaseIFG` object when used from
`PhaseCFG::global_code_motion()` doesn't even use the `_adjs`
array. So a patch like:
diff --git a/src/hotspot/share/opto/chaitin.hpp b/src/hotspot/share/opto/chaitin.hpp
index cf02deb6019..4e5333bf181 100644
--- a/src/hotspot/share/opto/chaitin.hpp
+++ b/src/hotspot/share/opto/chaitin.hpp
@@ -258,7 +258,7 @@ class PhaseIFG : public Phase {
VectorSet *_yanked;
PhaseIFG( Arena *arena );
- void init( uint maxlrg );
+ void init( uint maxlrg, bool no_adjs = false );
// Add edge between a and b. Returns true if actually added.
int add_edge( uint a, uint b );
diff --git a/src/hotspot/share/opto/gcm.cpp b/src/hotspot/share/opto/gcm.cpp
index ebdefe597ff..fefd75a88c5 100644
--- a/src/hotspot/share/opto/gcm.cpp
+++ b/src/hotspot/share/opto/gcm.cpp
@@ -1704,7 +1704,9 @@ void PhaseCFG::global_code_motion() {
rm_live.reset_to_mark(); // Reclaim working storage
IndexSet::reset_memory(C, &live_arena);
uint node_size = regalloc._lrg_map.max_lrg_id();
- ifg.init(node_size); // Empty IFG
+ ifg.init(node_size, true); // Empty IFG
regalloc.set_ifg(ifg);
regalloc.set_live(live);
regalloc.gather_lrg_masks(false); // Collect LRG masks
diff --git a/src/hotspot/share/opto/ifg.cpp b/src/hotspot/share/opto/ifg.cpp
index d12698121b9..e42121c2254 100644
--- a/src/hotspot/share/opto/ifg.cpp
+++ b/src/hotspot/share/opto/ifg.cpp
@@ -42,18 +42,24 @@
PhaseIFG::PhaseIFG( Arena *arena ) : Phase(Interference_Graph), _arena(arena) {
}
-void PhaseIFG::init( uint maxlrg ) {
+void PhaseIFG::init( uint maxlrg, bool no_adjs ) {
_maxlrg = maxlrg;
_yanked = new (_arena) VectorSet(_arena);
_is_square = false;
// Make uninitialized adjacency lists
- _adjs = (IndexSet*)_arena->Amalloc(sizeof(IndexSet)*maxlrg);
+ if (no_adjs) {
+ _adjs = nullptr;
+ } else {
+ _adjs = (IndexSet*)_arena->Amalloc(sizeof(IndexSet)*maxlrg);
+ }
// Also make empty live range structures
_lrgs = (LRG *)_arena->Amalloc( maxlrg * sizeof(LRG) );
memset((void*)_lrgs,0,sizeof(LRG)*maxlrg);
// Init all to empty
for( uint i = 0; i < maxlrg; i++ ) {
- _adjs[i].initialize(maxlrg);
+ if (_adjs != nullptr) {
+ _adjs[i].initialize(maxlrg);
+ }
_lrgs[i].Set_All();
}
}
saves a lot of memory. NMT reports peak malloc memory to be 810 MB
then (that saves a bit more than 470MB, not sure why).
Instead of the fix above, I propose lazyly allocating the array of
pointers to bitsets that `IndexSet` uses (`_blocks` field). The reason
for that is that:
- for the `PhaseIFG` issue, it pretty much has the same effect as
the patch above (the 470 MB of arrays are not allocated).
- `PhaseLive` uses arrays of `IndexSet`s as well. One per block. The
compilation has 20961 blocks. That's:
122839 / 256 * 20961 = 80MB
total of bitblock arrays. I noticed at least one of the `IndexSet`
array contains a large proportion of empty `IndexSet`s (the `_defs`
field of `PhaseLive`). Lazy allocating bitset arrays is an easy way to
save some extra memory.
With the patch for this PR, peak malloc memory is reported by NMT to
be: 670MB. That's misleading: that number is working from a repo that
doesn't have the fix for 8345287 (C2: live in computation is broken)
and so has no live ins. Given the patch causes storage for `IndexSet`s
to be lazy allocated and 8345287 causes live ins to stay empty, we end
up saving space for live ins that should be populated: that's an extra
100MB. So actual peak storage is ~770MB and there's a small extra
improvement compared to the 810MB of the patch above.
Compilation speed doesn't seem to be affected by this change.
-------------
Commit messages:
- fix
Changes: https://git.openjdk.org/jdk/pull/23075/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23075&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8333697
Stats: 60 lines in 3 files changed: 36 ins; 15 del; 9 mod
Patch: https://git.openjdk.org/jdk/pull/23075.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/23075/head:pull/23075
PR: https://git.openjdk.org/jdk/pull/23075
More information about the hotspot-compiler-dev
mailing list