RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive
Robert Toyonaga
duke at openjdk.org
Tue Jan 27 20:06:02 UTC 2026
On Fri, 23 Jan 2026 10:18:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> This is a continuation - second attempt - of https://github.com/openjdk/jdk/pull/28659.
>
> ----
>
> A customer reported a native stack overflow when producing a JFR recording with path-to-gc-roots=true. This happens regularly, see similar cases in JBS (e.g. https://bugs.openjdk.org/browse/JDK-8371630, https://bugs.openjdk.org/browse/JDK-8282427 etc).
>
> We limit the maximum graph search depth (DFSClosure::max_dfs_depth) to prevent stack overflows. That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
>
> This patch rewrites the DFS heap tracer to be non-recursive. This is mostly textbook stuff, but the devil is in the details. Nevertheless, the algorithm should be a straightforward read.
>
> ### Memory usage of old vs new algorithm:
>
> The new algorithm uses, on average, a bit less memory than the old one. The old algorithm did cost ((avg stackframe size in bytes) * depth). As we have seen, e.g., in JDK-8371630, a depth of 3200 can max out ~1MB of stack space.
>
> The new algorithm costs ((avg number of outgoing refs per instanceKlass oop) * depth * 16. For a depth of 3200, we get typical probe stack sizes of 100KB..200KB. But we also cap probestack size, similar to how we cap the max. graph depth.
>
> In any case, these numbers are nothing to worry about. For a more in-depth explanation about memory cost, please see the comment in dfsClosure.cpp.
>
> ### Possible improvements/simplifications in the future:
>
> DFS works perfectly well alone now. It no longer depends on stack size, and its memory usage is typically smaller than BFS. IMHO, it would be perfectly fine to get rid of BFS and rely solely on the non-recursive DFS. The benefit would be a decrease in complexity and fewer tests to run and maintain. It should also be easy to convert into a parallelized version later.
>
> I kept the _max_dfs_depth_ parameter for now, but tbh it is no longer very useful. Before, it prevented stack overflows. Now, it is just an indirect way to limit probe stack size. But we also explicitly cap the probe stack size, so _max_dfs_depth_ is redundant. Removing it would require changing the statically allocated reference stack to be dynamically allocated, but that should not be difficult.
>
> ### Observable differences
>
> There is one observable side effect to the changed algorithm. The non-recursive algorithm processes oops a...
src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 125:
> 123: // smaller. Not a problem at all.
> 124: //
> 125: // But we could run into weird pathological object graphs. Therfore we also
Suggestion:
// But we could run into weird pathological object graphs. Therefore we also
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/29382#discussion_r2733627437
More information about the hotspot-jfr-dev
mailing list