Forwarding Tables overhead

Jean-Philippe Bempel jean-philippe.bempel at datadoghq.com
Wed May 7 08:35:50 UTC 2025


Hello ZGC team,

I would like to raise an issue we encounter on our production system
using Generational ZGC with jdk 23. We have sporadically some OOM
Kills in a container environment t
hat seems correlated to a spike of high Java heap allocation. We are
running the JVM with 19GB of Java heap in a container limited to 26GB
using those JVM flags:
-XX:+UseZGC -XX:SoftMaxHeapSize=15g -XX:ZAllocationSpikeTolerance=5
-XX:+UseLargePages -XX:+UseTransparentHugePages.
Normally I don't consider any allocation happening in Java as a
trigger for OOMKill except for any related things like direct memory.
The investigation with higher
container memory and NMT JFR events plots leads us to see a spike of
allocation for GC structs peaking at more than 3GB while normally at
around 512MB [1].
This triggers me to suspect forwarding tables: I have build the
following simulator:

import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.locks.LockSupport;

public class GCStress {
    static List<Object> roots = new ArrayList<>();
    public static void main(String[] args) {
        System.out.println("Starting GC stress test...");
        LockSupport.parkNanos(Duration.of(10, ChronoUnit.SECONDS).toNanos());
        while (true) {
            spike();
            LockSupport.parkNanos(Duration.of(10,
ChronoUnit.SECONDS).toNanos());
        }
    }

    private static void spike() {
        roots.clear();
        for (int i = 0; i < 4; i++) {
            new Thread(() -> {
                for (int j = 0; j < 500_000; j++) {
                    roots.add(new Payload());
                }
            }).start();
        }
    }

    private static class Payload {
        long l00;
        long l01;
        long l02;
        long l03;
        long l04;
        long l05;
        long l06;
        long l07;
        long l08;
        long l09;
        List<Object> internal = new ArrayList<>();
        public Payload() {
            for (int i = 0; i < 100; i++) {
                internal.add(new Object());
            }
        }
    }
}

and running it with following command line on jdk 23:
java -XX:StartFlightRecording=dumponexit=true,filename=gcstress.jfr
-XX:NativeMemoryTracking=summary -XX:+UseZGC "-Xlog:gc*:gc.log"
-Xmx16G GCStress

looking for "Forwarding Usage" in gc log [2] gives me this:
[11,175s][info][gc,reloc    ] GC(0) Y: Forwarding Usage: 824M
[14,897s][info][gc,reloc    ] GC(1) Y: Forwarding Usage: 3229M
[19,141s][info][gc,reloc    ] GC(2) y: Forwarding Usage: 2M
[20,335s][info][gc,reloc    ] GC(3) Y: Forwarding Usage: 26M
[20,613s][info][gc,reloc    ] GC(4) y: Forwarding Usage: 235M
[22,390s][info][gc,reloc    ] GC(5) y: Forwarding Usage: 1867M
[22,390s][info][gc,reloc    ] GC(3) O: Forwarding Usage: 0M
[24,517s][info][gc,reloc    ] GC(6) Y: Forwarding Usage: 51M
[24,534s][info][gc,reloc    ] GC(6) O: Forwarding Usage: 0M
[30,694s][info][gc,reloc    ] GC(7) Y: Forwarding Usage: 337M
[37,576s][info][gc,reloc    ] GC(8) y: Forwarding Usage: 2355M
[41,164s][info][gc,reloc    ] GC(9) y: Forwarding Usage: 215M
[41,164s][info][gc,reloc    ] GC(7) O: Forwarding Usage: 0M
[45,843s][info][gc,reloc    ] GC(10) Y: Forwarding Usage: 3528M
[55,427s][info][gc,reloc    ] GC(11) y: Forwarding Usage: 2M
[59,077s][info][gc,reloc    ] GC(12) y: Forwarding Usage: 3628M
[59,077s][info][gc,reloc    ] GC(10) O: Forwarding Usage: 0M
[63,599s][info][gc,reloc    ] GC(13) Y: Forwarding Usage: 3M
[73,646s][info][gc,reloc    ] GC(14) y: Forwarding Usage: 3838M
[76,746s][info][gc,reloc    ] GC(15) y: Forwarding Usage: 3859M
[82,553s][info][gc,reloc    ] GC(16) y: Forwarding Usage: 225M
[86,039s][info][gc,reloc    ] GC(17) y: Forwarding Usage: 3093M
[86,039s][info][gc,reloc    ] GC(13) O: Forwarding Usage: 0M
[89,845s][info][gc,reloc    ] GC(18) Y: Forwarding Usage: 303M
[89,892s][info][gc,reloc    ] GC(18) O: Forwarding Usage: 0M
[91,587s][info][gc,reloc    ] GC(19) Y: Forwarding Usage: 202M
[94,597s][info][gc,reloc    ] GC(20) y: Forwarding Usage: 1078M
[102,230s][info][gc,reloc    ] GC(21) y: Forwarding Usage: 1903M
[102,230s][info][gc,reloc    ] GC(19) O: Forwarding Usage: 0M
[104,895s][info][gc,reloc    ] GC(22) Y: Forwarding Usage: 2160M
[114,142s][info][gc,reloc    ] GC(23) y: Forwarding Usage: 1806M
[118,191s][info][gc,reloc    ] GC(24) y: Forwarding Usage: 3718M
[118,191s][info][gc,reloc    ] GC(22) O: Forwarding Usage: 0M
[122,279s][info][gc,reloc    ] GC(25) y: Forwarding Usage: 1M
[126,271s][info][gc,reloc    ] GC(26) Y: Forwarding Usage: 3204M
[131,925s][info][gc,reloc    ] GC(27) y: Forwarding Usage: 684M
[133,675s][info][gc,reloc    ] GC(28) y: Forwarding Usage: 1443M
[133,676s][info][gc,reloc    ] GC(26) O: Forwarding Usage: 0M
[137,303s][info][gc,reloc    ] GC(29) Y: Forwarding Usage: 2389M
[147,488s][info][gc,reloc    ] GC(30) y: Forwarding Usage: 1M
[153,150s][info][gc,reloc    ] GC(31) y: Forwarding Usage: 3871M
[153,151s][info][gc,reloc    ] GC(29) O: Forwarding Usage: 0M
[153,585s][info][gc,reloc    ] GC(32) Y: Forwarding Usage: 308M
[159,519s][info][gc,reloc    ] GC(33) y: Forwarding Usage: 1933M
[169,010s][info][gc,reloc    ] GC(34) y: Forwarding Usage: 1740M
[169,011s][info][gc,reloc    ] GC(32) O: Forwarding Usage: 0M
[176,374s][info][gc,reloc    ] GC(35) Y: Forwarding Usage: 4071M
[190,786s][info][gc,reloc    ] GC(36) y: Forwarding Usage: 4051M
[196,478s][info][gc,reloc    ] GC(37) y: Forwarding Usage: 4050M
[196,479s][info][gc,reloc    ] GC(35) O: Forwarding Usage: 0M
[197,187s][info][gc,reloc    ] GC(38) y: Forwarding Usage: 719M
[199,733s][info][gc,reloc    ] GC(39) y: Forwarding Usage: 2318M
[202,880s][info][gc,reloc    ] GC(40) y: Forwarding Usage: 4M
[206,743s][info][gc,reloc    ] GC(41) Y: Forwarding Usage: 270M
[212,209s][info][gc,reloc    ] GC(42) y: Forwarding Usage: 2098M
[215,218s][info][gc,reloc    ] GC(43) y: Forwarding Usage: 630M
[215,218s][info][gc,reloc    ] GC(41) O: Forwarding Usage: 0M
[216,553s][info][gc,reloc    ] GC(44) Y: Forwarding Usage: 69M
[219,661s][info][gc,reloc    ] GC(45) y: Forwarding Usage: 989M
[226,256s][info][gc,reloc    ] GC(46) y: Forwarding Usage: 1641M
[226,256s][info][gc,reloc    ] GC(44) O: Forwarding Usage: 0M
[229,217s][info][gc,reloc    ] GC(47) y: Forwarding Usage: 4M
[234,397s][info][gc,reloc    ] GC(48) Y: Forwarding Usage: 3966M
[247,632s][info][gc,reloc    ] GC(49) y: Forwarding Usage: 3M
[250,257s][info][gc,reloc    ] GC(50) y: Forwarding Usage: 1725M

and looking for NMT JFR event [3]:
jfr view native-memory-committed gcstress.jfr

Memory Type                    First Observed   Average Last Observed   Maximum
------------------------------ -------------- --------- ------------- ---------
GC                                   113.7 MB    2.6 GB        2.4 GB    4.8 GB

Which confirms my hypothesis about the spike of forwarding table usage
in case of high spike of java heap allocation before a GC cycle.

I saw an article [4] talking about compact representation of the
forwarding table wondering if it was implemented or planned to be
implemented in the future. This issue forces us to reconsider the
sizing of our container to account for this spike of Forwarding Usage.
Considering the magnitude of this overhead (almost 20% of the Java
Heap) do you think this is something worth improving?

Thanks
Jean-Philippe Bempel

[1] https://ginnieandfifounet.com/jpb/zgc_forwardingtable/Screenshot_spike_GCstructs.png
[2] https://ginnieandfifounet.com/jpb/zgc_forwardingtable/zgc_high_forwarding_usage.log
[3] https://ginnieandfifounet.com/jpb/zgc_forwardingtable/gcstress.jfr
[4] https://inside.java/2020/06/25/compact-forwarding/


More information about the zgc-dev mailing list