RFR: 8346920: Serial: Support allocation in old generation before GC [v2]

Thomas Schatzl tschatzl at openjdk.org
Mon Jan 20 14:57:37 UTC 2025


On Mon, 6 Jan 2025 14:36:23 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> This PR introduces a new strategy to determine whether an allocation should be attempted in the old generation or if a GC cycle should be initiated, based on the `GCTimeRatio`. With this change, the benchmark attached to the ticket now completes in ~13 GC, a significant improvement compared to the >1000 GC observed previously.
>> 
>> Test: tier1-3
>
> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - review
>  - Merge branch 'master' into s1-gc-time-ratio
>  - s1-gc-time-ratio

As far as I understand, the test program expands that `ArrayList` with new `Short` objects, allocating until OOM.

The problem to be solved in this case is how quickly the collector goes OOM; it also is related to the C2 compiler doing reallocation of C2 stack-allocated objects (when I ran the test with `-XX:TieredStopAtLevel=3` Serial GC also quickly OOMEd reproducably).

Current code, since the typical allocation here is small, prefers doing a GC pause to allocation in old gen because the allocation is small (that `Short` object). The GC pause finds out that it should do a full gc only because its predictions indicate that the young gc is "unsafe" anyway (it's not "unsafe" as in not able to do it, but likely to end up in an evacuation failure).

The reason why the application will fail faster is that it allows many more increments of that `list` array without GC. Without the change, as the that array gets larger, the frequency of the GCs will get larger without noticable more promotion going on as the amount of possible reallocations of `list` per GC gets smaller. And it will do full gcs all the time because of the predictions indicating an unsafe young gc (which is correct as the `list` array will be > to-space quickly).

(It would be nice if the title and description of the CR somehow reflected this - it reads like there is no support of direct allocation in old gen for Serial GC which is in fact not the case).

Incidentally, Parallel GC has the same issue, running hundreds of full gcs before terminating (I did not check how many it takes, and probably never also waited for Serial GC).

As far as I understand the change, this change prevents "back-to-back" GCs, opting for allocating everything into the old generation (if not full) instead. I wonder if that has not negative impact (more full gcs) for applications that legitimately allocate lots of (short-living) objects as this will fill up the old generation (much) faster than before. Particularly applications that use relatively small heap for that.
It seems like this behavior may be advantageous for startup when supposedly initializing/loading data structures too (just fill up the whole heap and compact completely).

Did you run some other benchmarks with this change? (Maybe the smaller dacapo benchmarks at "reasonably" sized heaps will show something?)

I'm actually a bit conflicted about this change. It seems to fix a (rare?) policy issue, but at the same time adds a new mechanic (use of `GCTimeRatio`) for Serial GC.
So for now I would like to wait for measurements on potential impact on "more realistic" applications.

Hth,
  Thomas

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22899#issuecomment-2602631622


More information about the hotspot-gc-dev mailing list