RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v3]

Wed Jan 28 07:53:47 UTC 2026

> This is a continuation - second attempt - of https://github.com/openjdk/jdk/pull/28659. 
> 
> ----
> 
> A customer reported a native stack overflow when producing a JFR recording with path-to-gc-roots=true. This happens regularly, see similar cases in JBS (e.g. https://bugs.openjdk.org/browse/JDK-8371630, https://bugs.openjdk.org/browse/JDK-8282427 etc).
> 
> We limit the maximum graph search depth (DFSClosure::max_dfs_depth) to prevent stack overflows. That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
> 
> This patch rewrites the DFS heap tracer to be non-recursive. This is mostly textbook stuff, but the devil is in the details. Nevertheless, the algorithm should be a straightforward read.
> 
> ### Memory usage of old vs new algorithm:
> 
> The new algorithm uses, on average, a bit less memory than the old one. The old algorithm did cost ((avg stackframe size in bytes) * depth). As we have seen, e.g., in JDK-8371630, a depth of 3200 can max out ~1MB of stack space. 
> 
> The new algorithm costs ((avg number of outgoing refs per instanceKlass oop) * depth * 16. For a depth of 3200, we get typical probe stack sizes of 100KB..200KB. But we also cap probestack size, similar to how we cap the max. graph depth. 
> 
> In any case, these numbers are nothing to worry about. For a more in-depth explanation about memory cost, please see the comment in dfsClosure.cpp.
> 
> ### Possible improvements/simplifications in the future:
> 
> DFS works perfectly well alone now. It no longer depends on stack size, and its memory usage is typically smaller than BFS. IMHO, it would be perfectly fine to get rid of BFS and rely solely on the non-recursive DFS. The benefit would be a decrease in complexity and fewer tests to run and maintain. It should also be easy to convert into a parallelized version later.
> 
> I kept the _max_dfs_depth_ parameter for now, but tbh it is no longer very useful. Before, it prevented stack overflows. Now, it is just an indirect way to limit probe stack size. But we also explicitly cap the probe stack size, so _max_dfs_depth_ is redundant. Removing it would require changing the statically allocated reference stack to be dynamically allocated, but that should not be difficult.
> 
> ### Observable differences
> 
> There is one observable side effect to the changed algorithm. The non-recursive algorithm processes oops a...

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  dont incremement _num_objects_processed for follow-up chunks

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/29382/files
  - new: https://git.openjdk.org/jdk/pull/29382/files/af5c5585..66276985

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=29382&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29382&range=01-02

  Stats: 6 lines in 1 file changed: 4 ins; 2 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/29382.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/29382/head:pull/29382

PR: https://git.openjdk.org/jdk/pull/29382