RFR: 8373096: JFR leak profiler: path-to-gc-roots search should be non-recursive [v4]
Frederic Thevenet
fthevenet at openjdk.org
Wed Dec 10 14:27:35 UTC 2025
On Tue, 9 Dec 2025 12:08:16 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> A customer reported a crash when producing a JFR recording with `path-to-gc-roots=true`. It was a native stack overflow that occurred during the recursive path-to-gc-root search performed in the context of PathToGcRootsOperation.
>>
>> We try to avoid this by limiting the maximum search depth (DFSClosure::max_dfs_depth). That solution is brittle, however, since recursion depth is not a good proxy for thread stack usage: it depends on many factors, e.g., compiler inlining decisions and platform specifics. In this case, the VMThread's stack was too small.
>>
>> This RFE changes the algorithm to be non-recursive.
>>
>> Note that as a result of this change, the order in which oop maps are walked per oop is reversed : last oops are processed first. That should not matter for the end result, however. The search is still depth-first.
>>
>> Note that after this patch, we could easily remove the max_depth limitation altogether. I left it in however since this was not the scope of this RFE.
>>
>> Testing:
>>
>> - Tested manually with very small (256K) thread stack size for the VMThread - the patched version works where the old version crashes out
>> - Compared JFR recordings from both an unpatched version (with a large enough VMThread stack size) and a patched version; made sure that the content of "Old Object Sample" was identical
>> - Ran locally all jtreg tests in jdk/jfr
>> - GHAs
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
> final fixes
src/hotspot/share/jfr/leakprofiler/chains/dfsClosure.cpp line 177:
> 175: assert(!ref.is_null(), "invariant");
> 176: const oop pointee = ref.dereference();
> 177: assert(pointee != nullptr, "invariant");
Small thing: is this still useful since since `pointee` is no longer used with this change?
We assert that `ref.dereference() != nullptr` right after we pop it from the stack in `drain_probe_stack` anyway.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28659#discussion_r2606872452
More information about the hotspot-jfr-dev
mailing list