RFR: 8346920: Serial: Support allocation in old generation when heap is almost full [v2]

Tue Jan 28 06:34:54 UTC 2025

On Mon, 27 Jan 2025 15:26:35 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> Parallel also has the problem of fragmentation, where a single large obj in the beginning of eden can cause a large hole in the end of old-gen, because objs can't cross generation boundary.
> 
> Parallel doesn't have the problem of running >1000 GCs before exit, captured by the attached bm, because Parallel implements `UseGCOverheadLimit`. (For this particular bm with the selected heap-size, this problematic scenario doesn't occur for Parallel. One can even dodge this problematic scenario in Serial by using a diff heap-size, e.g. 200m instead of 100m.)

Interesting. I said this because I observe the same behaviour (stuck running thousands of full GCs). When I tested it I was on a couple of weeks old JDK25 build cc198972022c94199d698461e2ac42afc0058fd7
GC logs:

$ ./images/jdk/bin/java -Xmx100m -XX:+UseParallelGC -Xlog:gc,gc+init StressAdd.java
[0.002s][info][gc,init] CardTable entry size: 512
[0.002s][info][gc     ] Using Parallel
[0.003s][info][gc,init] Version: 25-internal-LTS-2025-01-27-1224260.aboldtch... (release)
[0.003s][info][gc,init] CPUs: 32 total, 32 available
[0.003s][info][gc,init] Memory: 63773M
[0.003s][info][gc,init] Large Page Support: Disabled
[0.003s][info][gc,init] NUMA Support: Disabled
[0.003s][info][gc,init] Compressed Oops: Enabled (32-bit)
[0.003s][info][gc,init] Alignments: Space 512K, Generation 512K, Heap 2M
[0.003s][info][gc,init] Heap Min Capacity: 8M
[0.003s][info][gc,init] Heap Initial Capacity: 100M
[0.003s][info][gc,init] Heap Max Capacity: 100M
[0.003s][info][gc,init] Pre-touch: Disabled
[0.003s][info][gc,init] Parallel Workers: 23
[0.395s][info][gc     ] GC(0) Pause Young (Allocation Failure) 26M->5M(96M) 5.227ms
[0.493s][info][gc     ] GC(1) Pause Young (Allocation Failure) 30M->17M(96M) 13.714ms
[0.518s][info][gc     ] GC(2) Pause Young (Allocation Failure) 42M->38M(96M) 19.380ms
[0.600s][info][gc     ] GC(3) Pause Full (Allocation Failure) 77M->59M(96M) 75.591ms
[0.709s][info][gc     ] GC(4) Pause Full (Allocation Failure) 73M->70M(96M) 102.293ms
[...]
[852.306s][info][gc     ] GC(10073) Pause Full (Allocation Failure) 82M->82M(96M) 82.800ms
[852.387s][info][gc     ] GC(10074) Pause Full (Allocation Failure) 82M->82M(96M) 81.312ms
[852.470s][info][gc     ] GC(10075) Pause Full (Allocation Failure) 82M->82M(96M) 83.034ms
[852.553s][info][gc     ] GC(10076) Pause Full (Allocation Failure) 82M->82M(96M) 82.130ms
[852.634s][info][gc     ] GC(10077) Pause Full (Allocation Failure) 82M->82M(96M) 81.610ms
[852.714s][info][gc     ] GC(10078) Pause Full (Allocation Failure) 82M->82M(96M) 79.986ms
[852.797s][info][gc     ] GC(10079) Pause Full (Allocation Failure) 82M->82M(96M) 82.543ms
Killed 

(I also just retried this with jdk-25+7 , which has the same behaviour)

> That's another way to address the many-gc-before-exit problem. However, end users may get surprised with this "premature" OOM when heap still has a sizable margin till 100%. (Ofc, this is pure speculation.)

Yeah I agree. But it might still be the sensible behaviour. When running the GC in this degenerate mode, it might be better to just OOM. Something is probably misconfigured.

> I believe the newly introduced field, `_is_heap_almost_full`, can also be used to impl what you proposed. IOW, this field enables us to detect and react to "emergency" state. (Just to make it explicit, this allocating-in-old-gen when heap is tight has been in Serial for long, and was mistakenly removed in JDK-8333786.) As for which approach is "better", we probably need to assess it after cleaning up some other related code/logic, e.g. heap/generation-resizing.

Yeah.  This would require a lot more thought to get right and iron out all the kinks and interactions. However it would be nice if we could guarantee that we never get into what seemingly looks like an perpetual full GC loop (even if some tiny progress is made each iteration). Even if it comes at the cost of a "premature" OOM.

Only looked at this in the context of Serial, but this patch could still have this full GC problem. If there is an object which causes perpetual promotion failures in YC and cannot be compacted because of the generation boundaries in the full GC. And we have a pattern of smaller temporary allocations followed by an allocation which does not fit in old, but requires a YC to fit in young.

I think an alternative solution we thought about is that if the full GC could move the generation boundaries so that we can compact everything to old. (Have some elasticity, acting as a rubber band, which springs back once the old residency goes down). Down to some minimal young size. Resulting in something like:
```c++
  // If young-gen can handle this allocation, attempt young-gc firstly.
  bool should_run_young_gc = _young_gen->should_allocate(size, is_tlab);
  collect_at_safepoint(!should_run_young_gc);

  result = attempt_allocation(size, is_tlab, false /*first_only*/);
  if (result != nullptr) {
    return result;
  }

  if (_young_gen->is_minimal_sized()) {
     return nullptr; 
  }

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23270#issuecomment-2618040514