RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive
Thomas Stuefe
stuefe at openjdk.org
Tue Jan 27 12:53:01 UTC 2026
This is a continuation - second attempt - of https://github.com/openjdk/jdk/pull/28659.
----
A customer reported a native stack overflow when producing a JFR recording with path-to-gc-roots=true. This happens regularly, see similar cases in JBS (e.g. https://bugs.openjdk.org/browse/JDK-8371630, https://bugs.openjdk.org/browse/JDK-8282427 etc).
We limit the maximum graph search depth (DFSClosure::max_dfs_depth) to prevent stack overflows. That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
This patch rewrites the DFS heap tracer to be non-recursive. This is mostly textbook stuff, but the devil is in the details. Nevertheless, the algorithm should be a straightforward read.
### Memory usage of old vs new algorithm:
The new algorithm uses, on average, a bit less memory than the old one. The old algorithm did cost ((avg stackframe size in bytes) * depth). As we have seen, e.g., in JDK-8371630, a depth of 3200 can max out ~1MB of stack space.
The new algorithm costs ((avg number of outgoing refs per instanceKlass oop) * depth * 16. For a depth of 3200, we get typical probe stack sizes of 100KB..200KB. But we also cap probestack size, similar to how we cap the max. graph depth.
In any case, these numbers are nothing to worry about. For a more in-depth explanation about memory cost, please see the comment in dfsClosure.cpp.
### Possible improvements/simplifications in the future:
DFS works perfectly well alone now. It no longer depends on stack size, and its memory usage is typically smaller than BFS. IMHO, it would be perfectly fine to get rid of BFS and rely solely on the non-recursive DFS. The benefit would be a decrease in complexity and fewer tests to run and maintain. It should also be easy to convert into a parallelized version later.
I kept the _max_dfs_depth_ parameter for now, but tbh it is no longer very useful. Before, it prevented stack overflows. Now, it is just an indirect way to limit probe stack size. But we also explicitly cap the probe stack size, so _max_dfs_depth_ is redundant. Removing it would require changing the statically allocated reference stack to be dynamically allocated, but that should not be difficult.
### Observable differences
There is one observable side effect to the changed algorithm. The non-recursive algorithm processes oops and roots in reverse order. That means we may see different GC roots in JMC now compared to the recursive version of this algorithm.
For example, for a static main class member:
- One order of root processing may process the CLDG first; this causes the CLDs to be iterated, which iterates the Klass, which iterates the associated j.l.Class object. The reference stack starts at the j.l.Class object, and the root displayed in JMC says "ClassLoader".
- Another root processing order may process the global handles first; this causes the AppClassLoader to be processed first, which causes the main class Mirror to be processed, and we also hit the static member, but now the root in JMC is displayed as "Global Object Handle: VM Global"
I think that effect is benign - a root is a root.
But, as a future improvement, I think displaying the first (or first n) objects of the reference stack would be very helpful to the analyst (possibly more so than just printing the roots). But maybe that feature exists already; I am no JMC expert.
## Testing
- I ran extensive tests manually to make sure the resulting jfr files showed the same information (apart from the observable differences mentioned above).
- I manually tested that we handle max_depth reached and probestack exhaustion gracefully
- I added a new test to execute DFS-only on a very small stack size. Works fine (crashes with stock JVM).
- I added a new test that exercises the new array chunking in the tracer
- I made sure the patch fixes https://bugs.openjdk.org/browse/JDK-8371630 by running the TestWaste.java with a tiny stack size - crashes without patch, works with patch
-------------
Commit messages:
- fix after JDK-8375040
- tests
- wip
- revert unnecessary changes
- wip
- small stack test
- tweaking DFS-BFS test
- fix windows build warning
- reduce diff
- fix performance problem on bfs-dfs mixed mode
- ... and 8 more: https://git.openjdk.org/jdk/compare/cba7d88c...5d79624b
Changes: https://git.openjdk.org/jdk/pull/29382/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29382&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8373096
Stats: 543 lines in 8 files changed: 502 ins; 8 del; 33 mod
Patch: https://git.openjdk.org/jdk/pull/29382.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/29382/head:pull/29382
PR: https://git.openjdk.org/jdk/pull/29382
More information about the hotspot-jfr-dev
mailing list