Aligning the Serial collector with ZGC

Thomas Schatzl thomas.schatzl at oracle.com
Thu Sep 26 08:20:11 UTC 2024


Hi Kirk,

   somewhat random comments...

On 23.09.24 17:48, Kirk Pepperdine wrote:> Hi,
 >
 > I wanted to surface to the mailing list that we've taken on the task
 > of adding Automated Heap Sizing (AHS) as has been introduced into ZGC
 > (and is currently being introduced into G1
 > https://github.com/openjdk/jdk/pull/20783
 > <https://github.com/openjdk/jdk/pull/20783>)) into the Serial
 > collector.
 > The goals of this effort are modeled after the goals for ZGC and we
 > plan to borrow as much as possible (or as much as makes sense). For
 > example, we would like to alter the default settings for -Xmx and
 > -Xms. Instead of 1/4, the default MaxHeapSize would be set to
 > available RAM. The
 > collector will use of memory and CPU pressure, similar to what was
 > introduced in ZGC, to control heap expansion and contraction. Current
 > sizing ergonomics is based on the number of non-daemon threads.
 > Altering this is expected to give the Serial collector a more dynamic
 > ability to uncommit memory no longer in use (thus be more memory
 > efficient when running in a container). The flags SoftMaxHeapSize and
 > SerialPressure as well as the level of global memory pressure would be
 > used to help guide ergonomic choices. This new ergonomic choice should
 > work to minimize GC overhead while avoiding becoming an OOM victim. As
 > part of this, the goal is to provide enough memory but not at all
 > costs.
 >
 > We see this work being broken down into several steps. Very roughly
 > the steps would be;
 >
 > - Introduce an adaptive size policy that takes into account memory and
 > CPU pressure along with global memory pressure.
 >     - Heap should be large enough to minimize GC overhead but not
 > large enough to trigger OOM.

(probably meant "small enough" the second time)

 >     - Introduce -XX:SerialPressure=[0-100] to support this work.

(Fwiw, regards to the other discussion, I agree that if we have a flag 
with the same "meaning" across collectors it might be useful to use the 
same name).

 >     - introduce a smoothing algorythm to avoid excessive small
 > resizes.

One option is to split this further into parts:

* list what actions Serial GC could do in reaction to memory pressure on 
an abstract level, and which make sense; from that see what 
functionality is needed.

* provide functionality that tries to keep some kind of GC/mutator time 
ratio; I would start with looking at G1 does because Serial GC's 
behaviour is probably closer to G1 than ZGC, but ymmv.
(Obviously improvements are welcome :))

(This may not need to be exposed externally like some 
GCTimeRatio/GCCPUPercentage/whatever flag name)

* add functionality to calculate memory pressure from the environment; 
maybe in a containerized environment from a manageable flag as it does 
not have a global "pressure" view. This could probably taken from ZGC, 
at least partially

* some transfer function that translates this external memory pressure, 
based on "GCPressure", (e.g. that "sigmoid" function plus lots of magic 
numbers) to reaction in the gc: e.g. change the gc/mutator pause time 
goal, start collections, uncommit memory...

* (probably) some background thread that continuously calculates and 
reacts on global pressure (uncommit memory, do a gc, resize heap, ...) 
because one probably does not want to wait for the next gc to react...

* do lots of testing to weed out corner cases

 > - Introduce manageable flag SoftMaxHeapSize to define a target heap
 > size nd set the default max heap size to 100% of available.

I am a bit torn about SoftMaxHeapSize in Serial GC. What do you envision 
that Serial GC would do when the SoftMaxHeapSize has been reached, and 
what if old gen occupancy permanently stays above that value?

The usefulness of SoftMaxHeapSize kind of relies on having a minimally 
invasive old gen collection that tries to get old gen usage back below 
that value.

Serial GC has no "minimally invasive" way to collect old generation. It 
is either Full GC or nothing. This is the only option for Serial, but 
always doing Full collections after reaching that threshold seems very 
heavy handed, expensive and undesirable to me (ymmv).

That reaction would follow the spirit of the flag though.

Maybe at the small heaps Serial GC targets, this makes sense, and full 
gc is not that costly anyway.

It might be useful to enumerate what actions could be performed on 
global pressure.

 > - Add in the ability to uncommit memory (to reduce global memory
 > pressure).
 >

The following imo outlines a compdoneletely separate idea, and should be 
discussed separately:

 >
 > While working through the details of this work I noted that there
 > appear  to opportunities to offer new defaults for other settings. For
 > example, [...]

That seems to be some more elaborate way of finding "optimal" generation 
size for a given heap size (which may follow from what the gc/mutator 
time ratio algorithm gives you).

 >
 > For Eden the guiding metric is allocation rate. For Survivor it's life
 > cycle (age table). For Tenured it's live set size. Using these metrics
 > to determine size of the parts and use that to then calculate a max
 > heap size has almost always yielded lower GC overheads than setting a
 > heap size and then letting ratios size everything. This maybe a
 > separate piece of work

+1

 > but the intent would be to have ergonomics calculate
 > optimal eden, survivor and tenured sizes. Each young collection is an
 > opportunity to resize Eden and Survivor whereas a full would be used
 > to resize Eden, Survivor and Tenured space. This may lead to the need
 > to ignore NewRatio and (the soft target) MaxGCPauseMillis.

Fwiw, the only collector that observes MaxGCPauseMillis is G1; in the 
context of Serial GC discussed further above I am confused.

Not sure if MaxGCPauseMillis would make sense in Serial GC given that 
you can't control Full GC pause length.

Also, in the context of G1 some of the statements above are hard to 
understand: e.g. the text seems to imply that there is a fixed ratio 
between eden and survivor which isn't really the case, at least not in 
the sense of Serial GC.

Could you elaborate?

Even then, with Serial GC's fixed generation sizes fine-grained 
on-the-fly adaptation as somewhat suggested might be harder than usual.

Not against doing all that, but it really sounds like separate work.

 >
 > As for testing. I’m currently looking at modifying HyperAlloc to add
 > ability to alter the shape of the load on the collector over time.
 >
 > All of this is still in it’s infancy and we’re open for guidance and
 > input.
 >
 > As for the work on G1, an initial patch as been submitted (URL above)
 > and is open for comments.
 >

The patch does not seem to implement AHS. It implements 
CurrentMaxHeapSize which might be what AHS uses to set max heap size.

To implement AHS for G1 roughly at least the following items need to be 
added/implemented/changed:

* remove the use of Min/MaxHeapFreeRatio for heap sizing. These flags 
completely disregard cpu and heap pressure based heap sizing (should 
also be removed from Serial GC - this means deprecating/obsoleting this 
flag as soon as the last user is gone).

* implement CurrentMaxHeapSize which is a (configurable) hard limit on 
how much the Java application may allocate (JDK-8204088) in support of 
AHS. As mentioned, that patch might be an initial discussion base.
I do not think we need a JEP for that, but it gives you more publicity.

* implement SoftMaxHeapSize in the sense of ZGC where it uses it to 
guide IHOP (or ZGC's equivalent). Note that I am not sure that 
SoftMaxHeapSize is something absolutely necessary in the context of AHS, 
but may be a tool.

* the same background functionality as for serial: implement some 
mechanism to control the heap size based on the decisions of AHS; i.e. 
start collections to get to heap target, uncommit stuff/enqueue for 
uncommit etc.

Currently G1 only resizes the heap during Remark and Full GC which is 
too limiting to follow current "memory pressure". Maybe use/update 
Soft/CurrentMaxHeapSize as needed so that GC compacts the heap first; 
this may either be in the form of JDK-8238687 which uncommits at every 
gc, which is probably still too limiting for an AHS system.

Probably other issues will crop up along the way.

* do lots of testing to weed out corner cases and hopefully not regress 
too much from current performance

Hth,
   Thomas



More information about the hotspot-gc-dev mailing list