Reference leak in old gen in Generational Shenandoah
Kemper, William
kemperw at amazon.com
Sat Dec 6 00:51:28 UTC 2025
I created https://bugs.openjdk.org/browse/JDK-8373203 to track progress.
________________________________
From: shenandoah-dev <shenandoah-dev-retn at openjdk.org> on behalf of Parker Winchester <pwinchester at palantir.com>
Sent: Friday, December 5, 2025 8:32:36 AM
To: shenandoah-dev at openjdk.org
Subject: [EXTERNAL] Reference leak in old gen in Generational Shenandoah
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
We just upgraded to JDK25 and are trying out Generational Shenandoah, coming from ZGC. We noticed native memory (in the "other" category) due to direct byte buffers steadily increasing and not getting freed - despite these DirectByteBuffer objects becoming unreachable and the GC clearly running frequently. One service of ours hit 2GB of native memory used after 24 hours, ultimately causing our service to be OOMKilled. Triggering GC's manually by taking a (live) heap histogram clears the native memory, so this seems to be a failure of the GC to find and clean up certain objects, rather than a true "leak."
We tracked this down to issues with Undertow's DefaultByteBufferPool, which uses Finalizers and WeakHashMaps - these both use types of references (eg WeakReferences) that need at least one additional GC cycle to be removed by the GC. I plan to submit a change to Undertow's code to reduce its reliance on these, but it's possible this issue impacts other code, so I produced a minimal repro of it that doesn't use native memory.
I believe the issue is a Reference in the old generation will sometimes fail to be discovered by the GC. A reference in the old gen will not be encountered by any young gen collections. And when it gets encountered in the old gen, should_discover() is returning false, so there's no way for it to ever be enqueued. I think this is due to the references being wrongly considered strongly live:
[23.999s][trace][gc,ref ] GC(259) Encountered Reference: 0x000000030000b6e8 (Weak, OLD)
[23.999s][trace][gc,ref ] GC(259) Reference strongly live: 0x000000030000b6e8
My minimal repro uses weak references, but I also noticed the issue with phantom references due to DirectByteBuffer
Summary of my repro
Each iteration it:
*
Allocates a simple object (MyLeakedObject - only necessary so it has a class name in the heap histogram) as well as a WeakReference to it.
*
It stores the WeakReference in a static list (this part appears to be necessary to the repro)
*
It then allocates a lot of garbage (80GB in a 8GB heap size) to force the object and the WeakReference to be promoted to the old gen
*
It then iterates over the static list and removes any WeakReferences with null referents
*
It then takes a heap histogram (not live, so we don't trigger GC), and prints the counts of MyLeakedObject and WeakReference
*
The loop then continues, allowing the object and its WeakReference to go out of scope.
*
Every 20 iterations it runs several system.gc() calls to prove that the counts return to 0 (system.gc() triggers a "global" GC which is different than an old gen GC).
The count will go up each iteration until the system.gc():
Iteration 1: MyLeakedObject=1, WeakReference=5, WeakRefs with live referent=1
Iteration 2: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2
Iteration 3: MyLeakedObject=3, WeakReference=7, WeakRefs with live referent=3
Iteration 4: MyLeakedObject=4, WeakReference=8, WeakRefs with live referent=4
Iteration 5: MyLeakedObject=5, WeakReference=9, WeakRefs with live referent=5
Iteration 6: MyLeakedObject=6, WeakReference=10, WeakRefs with live referent=6
Iteration 7: MyLeakedObject=7, WeakReference=11, WeakRefs with live referent=7
Iteration 8: MyLeakedObject=8, WeakReference=12, WeakRefs with live referent=8
Iteration 9: MyLeakedObject=9, WeakReference=13, WeakRefs with live referent=9
Iteration 10: MyLeakedObject=10, WeakReference=14, WeakRefs with live referent=10
Iteration 11: MyLeakedObject=11, WeakReference=15, WeakRefs with live referent=11
Iteration 12: MyLeakedObject=12, WeakReference=16, WeakRefs with live referent=12
Iteration 13: MyLeakedObject=13, WeakReference=17, WeakRefs with live referent=13
Iteration 14: MyLeakedObject=14, WeakReference=18, WeakRefs with live referent=14
Iteration 15: MyLeakedObject=15, WeakReference=19, WeakRefs with live referent=15
Iteration 16: MyLeakedObject=16, WeakReference=20, WeakRefs with live referent=16
Iteration 17: MyLeakedObject=17, WeakReference=21, WeakRefs with live referent=17
Iteration 18: MyLeakedObject=18, WeakReference=22, WeakRefs with live referent=18
Iteration 19: MyLeakedObject=19, WeakReference=23, WeakRefs with live referent=19
Iteration 20: MyLeakedObject=20, WeakReference=24, WeakRefs with live referent=20
Forcing GCs...
Iteration 21: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2
Expected behavior: Each iteration should see only 1 at most 2 of MyLeakedObject, since they are no longer in scope and sufficient GC activity (young + old gen GCs) has occurred
Actual behavior: Each iteration adds an additional MyLeakedObject and its WeakReference, leading to a leak
I have only tested with Corretto on Ubuntu & OSX
openjdk 25.0.1 2025-10-21 LTS OpenJDK Runtime Environment Corretto-25.0.1.8.1 (build 25.0.1+8-LTS) OpenJDK 64-Bit Server VM Corretto-25.0.1.8.1 (build 25.0.1+8-LTS, mixed mode, sharing)
I've tried with non-generational shenandoah (mode=satb) and the issue does not occur. It also does not occur for ZGC or G1.
I had a version of the repro that used DirectByteBuffers which yielded these results, strictly looking at reference processing in old gen GCs (running with -Xlog:gc*=info,gc+ref=trace)
Iteration 1: Native Memory = 1 KB
[20.423s][info ][gc,ref ] GC(46) Encountered references: Soft: 66, Weak: 183, Final: 0, Phantom: 3
[20.423s][info ][gc,ref ] GC(46) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0
[20.423s][info ][gc,ref ] GC(46) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0
Iteration 2: Native Memory = 2 KB
[30.687s][info ][gc,ref ] GC(52) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 4
[30.688s][info ][gc,ref ] GC(52) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0
[30.688s][info ][gc,ref ] GC(52) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0
Iteration 3: Native Memory = 3 KB
[54.496s][info ][gc,ref ] GC(70) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 5
[54.496s][info ][gc,ref ] GC(70) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 1
[54.496s][info ][gc,ref ] GC(70) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0
Iteration 4: Native Memory = 4 KB
[93.706s][info ][gc,ref ] GC(91) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 6
[93.706s][info ][gc,ref ] GC(91) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0
[93.706s][info ][gc,ref ] GC(91) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0
It's a little easier to see with DirectByteBuffer's Phantom references (there are 100+ unrelated WeakReferences, I believe these are used internally). Each iteration it adds another Phantom reference which is encountered, but fails to be discovered (due to being considered strongly live)
Run the repro with:
java -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 -XX:ShenandoahGuaranteedOldGCInterval=1000 -XX:+AlwaysPreTouch -Xmx8g -Xms8g GenShenWeakRefLeakRepro
These flags help prove that the references are guaranteed to be encountered during each old gen GC cycle (otherwise they might be skipped over if the region has very little garbage)
-XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0
This flag guarantees that references in old gen regions get processed every 1 second (each iteration takes about 2 seconds on my M1 macbook)
-XX:ShenandoahGuaranteedOldGCInterval=1000
Note I played around with the heap size and the allocation rate and found 8GB heap & 80GB allocated to be the most reliable way to reproduce the issue.
Source code for GenShenWeakRefLeakRepro.java
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.lang.ref.WeakReference;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;
/**
* Tests if WeakReferences with old-gen referents leak in Generational Shenandoah.
*/
public class GenShenWeakRefLeakRepro {
// Keep WeakReferences alive in a static list (will be in old gen)
private static final List<WeakReference<MyLeakedObject>> WEAK_REFS = new ArrayList<>();
private static final long[] COUNTS = new long[2];
static class MyLeakedObject {
private final int value;
MyLeakedObject(int value) {
this.value = value;
}
}
public static void main(String[] args) throws Exception {
//allocate garbage to promote WEAK_REFS to old gen
for (int i = 0; i < 800; i++) {
byte[] garbage = new byte[100 * 1024 * 1024];
garbage[i % garbage.length] = (byte) i;
}
for (int iteration = 0; iteration < 100; iteration++) {
// Create object and weak reference
MyLeakedObject obj = new MyLeakedObject(iteration);
WeakReference<MyLeakedObject> wr = new WeakReference<>(obj);
// Store in static list (so WeakRef survives and gets promoted)
WEAK_REFS.add(wr);
// Allocate garbage to promote both WeakRef and referent to old gen
for (int i = 0; i < 800; i++) {
byte[] garbage = new byte[100 * 1024 * 1024];
garbage[i % garbage.length] = (byte) i;
}
// Remove cleared WeakRefs (referent was collected)
WEAK_REFS.removeIf(w -> w.get() == null);
// Count objects
getObjectCounts();
// What remains are WeakRefs with live referents
long aliveCount = WEAK_REFS.size();
System.out.println("Iteration " + (iteration + 1) +
": MyLeakedObject=" + COUNTS[0] +
", WeakReference=" + COUNTS[1] +
", WeakRefs with live referent=" + aliveCount);
// Periodically force GCs
if ((iteration + 1) % 20 == 0) {
System.out.println("Forcing GCs...");
for (int i = 0; i < 4; i++) {
System.gc();
Thread.sleep(3000);
}
getObjectCounts();
System.out.println("After GC: MyLeakedObject=" + COUNTS[0] +
", WeakRefs with live referent=" + aliveCount);
}
}
}
private static void getObjectCounts() {
COUNTS[0] = 0;
COUNTS[1] = 0;
try {
Process p = new ProcessBuilder(
"jcmd", String.valueOf(ProcessHandle.current().pid()),
"GC.class_histogram", "-all")
.start();
try (BufferedReader r = new BufferedReader(
new InputStreamReader(p.getInputStream(), StandardCharsets.UTF_8))) {
String line;
while ((line = r.readLine()) != null) {
String[] parts = line.trim().split("\\s+");
if (parts.length >= 4) {
if (line.contains("GenShenWeakRefLeakRepro$MyLeakedObject")) {
COUNTS[0] = Long.parseLong(parts[1]);
} else if (line.contains("java.lang.ref.WeakReference ")) {
COUNTS[1] = Long.parseLong(parts[1]);
}
}
}
}
} catch (Exception e) {
System.err.println("Histogram failed: " + e.getMessage());
}
}
}
Thanks,
Parker Winchester
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/shenandoah-dev/attachments/20251206/bc1b92a7/attachment-0001.htm>
More information about the shenandoah-dev
mailing list