RFR: 8333697: C2: Hit MemLimit in PhaseCFG::global_code_motion

Roland Westrelin roland at openjdk.org
Mon Jan 13 14:53:56 UTC 2025


I investigated the failure from the `Test.java` that's attached to the
bug. The failure with this test is only reproducible up to 8334060
(Implementation of Late Barrier Expansion for G1) so experiments I
describe here are from the source code for the commit right before it.

Peak malloc memory usage reported by NMT is: 1.3GB

`PhaseCFG::global_code_motion()`, when `OptoRegScheduling` is true,
creates a `PhaseIFG` that's, when initialized, allocates `_adjs`: a
`maxlrg` array of `IndexSet`s that can contain up to `maxlrg`.

`maxlrg` in this case is 122839. An `IndexSet` is an array of pointers
to a 256 bit bitset: one `IndexSet` array needs: 


122839 / 256 * 8 = 3832


and there are of 122839:


3832 * 122839 = ~470 MB


It turns out the `PhaseIFG` object when used from
`PhaseCFG::global_code_motion()` doesn't even use the `_adjs`
array. So a patch like:


diff --git a/src/hotspot/share/opto/chaitin.hpp b/src/hotspot/share/opto/chaitin.hpp
index cf02deb6019..4e5333bf181 100644
--- a/src/hotspot/share/opto/chaitin.hpp
+++ b/src/hotspot/share/opto/chaitin.hpp
@@ -258,7 +258,7 @@ class PhaseIFG : public Phase {
   VectorSet *_yanked;
 
   PhaseIFG( Arena *arena );
-  void init( uint maxlrg );
+  void init( uint maxlrg, bool no_adjs = false );
 
   // Add edge between a and b.  Returns true if actually added.
   int add_edge( uint a, uint b );
diff --git a/src/hotspot/share/opto/gcm.cpp b/src/hotspot/share/opto/gcm.cpp
index ebdefe597ff..fefd75a88c5 100644
--- a/src/hotspot/share/opto/gcm.cpp
+++ b/src/hotspot/share/opto/gcm.cpp
@@ -1704,7 +1704,9 @@ void PhaseCFG::global_code_motion() {
     rm_live.reset_to_mark();           // Reclaim working storage
     IndexSet::reset_memory(C, &live_arena);
     uint node_size = regalloc._lrg_map.max_lrg_id();
-    ifg.init(node_size); // Empty IFG
+    ifg.init(node_size, true); // Empty IFG
     regalloc.set_ifg(ifg);
     regalloc.set_live(live);
     regalloc.gather_lrg_masks(false);    // Collect LRG masks
diff --git a/src/hotspot/share/opto/ifg.cpp b/src/hotspot/share/opto/ifg.cpp
index d12698121b9..e42121c2254 100644
--- a/src/hotspot/share/opto/ifg.cpp
+++ b/src/hotspot/share/opto/ifg.cpp
@@ -42,18 +42,24 @@
 PhaseIFG::PhaseIFG( Arena *arena ) : Phase(Interference_Graph), _arena(arena) {
 }
 
-void PhaseIFG::init( uint maxlrg ) {
+void PhaseIFG::init( uint maxlrg, bool no_adjs ) {
   _maxlrg = maxlrg;
   _yanked = new (_arena) VectorSet(_arena);
   _is_square = false;
   // Make uninitialized adjacency lists
-  _adjs = (IndexSet*)_arena->Amalloc(sizeof(IndexSet)*maxlrg);
+  if (no_adjs) {
+    _adjs = nullptr;
+  } else {
+    _adjs = (IndexSet*)_arena->Amalloc(sizeof(IndexSet)*maxlrg);
+  }
   // Also make empty live range structures
   _lrgs = (LRG *)_arena->Amalloc( maxlrg * sizeof(LRG) );
   memset((void*)_lrgs,0,sizeof(LRG)*maxlrg);
   // Init all to empty
   for( uint i = 0; i < maxlrg; i++ ) {
-    _adjs[i].initialize(maxlrg);
+    if (_adjs != nullptr) {
+      _adjs[i].initialize(maxlrg);
+    }
     _lrgs[i].Set_All();
   }
 }


saves a lot of memory. NMT reports peak malloc memory to be 810 MB
then (that saves a bit more than 470MB, not sure why).

Instead of the fix above, I propose lazyly allocating the array of
pointers to bitsets that `IndexSet` uses (`_blocks` field). The reason
for that is that:

- for the `PhaseIFG` issue, it pretty much has the same effect as
  the patch above (the 470 MB of arrays are not allocated).
  
- `PhaseLive` uses arrays of `IndexSet`s as well. One per block. The
  compilation has 20961 blocks. That's:
  

 122839 / 256 * 20961 = 80MB


total of bitblock arrays. I noticed at least one of the `IndexSet`
array contains a large proportion of empty `IndexSet`s (the `_defs`
field of `PhaseLive`). Lazy allocating bitset arrays is an easy way to
save some extra memory.

With the patch for this PR, peak malloc memory is reported by NMT to
be: 670MB. That's misleading: that number is working from a repo that
doesn't have the fix for 8345287 (C2: live in computation is broken)
and so has no live ins. Given the patch causes storage for `IndexSet`s
to be lazy allocated and 8345287 causes live ins to stay empty, we end
up saving space for live ins that should be populated: that's an extra
100MB. So actual peak storage is ~770MB and there's a small extra
improvement compared to the 810MB of the patch above.

Compilation speed doesn't seem to be affected by this change.

-------------

Commit messages:
 - fix

Changes: https://git.openjdk.org/jdk/pull/23075/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23075&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8333697
  Stats: 60 lines in 3 files changed: 36 ins; 15 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/23075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23075/head:pull/23075

PR: https://git.openjdk.org/jdk/pull/23075


More information about the hotspot-compiler-dev mailing list