RFR: 8373096: JFR: Path-to-gc-roots search should be non-recursive [v6]
Thomas Stuefe
stuefe at openjdk.org
Mon Feb 16 10:02:46 UTC 2026
On Wed, 4 Feb 2026 12:43:54 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:
>>> I wonder if you have looked at performance. For example, in which order is it best to check for nullptr, oop has been visited or whether the stack is full? You added a _num_objects_processed variable, but it’s never used, so you might want to remove it.
>>
>> Will do.
>>
>>> For the future, I think we want to keep BFS. Initially, I only had DFS, but the chains became so weird that I had to implement BFS.
>>
>> Sure; you are the maintainer, after all.
>>
>>>
>>> Regarding ordering, I think we want an order that makes the most sense to the user. ClassLoader is easier for users to understand than Global Object Handle. I'm not sure if that code is still present, but we had a specific order in which we processed roots.
>>
>> Did you mean this?
>>
>>
>> https://github.com/openjdk/jdk/blob/99bc98357dab78bef2cce7a10c98d13d1e5730e3/src/hotspot/share/jfr/leakprofiler/chains/rootSetClosure.cpp#L87-L97
>>
>> I can reverse the order in there and thus get the (roughly) reversed order. I already tested that, but refrained from adding it to the patch.
>>
>> What do you think about printing the actual (first, first and second ...) objects that are referenced by the roots? I think that would often help a lot. It helped me a lot in understanding what happened.
>>
>> In fact, I just traced the whole chain during development, hence the logging added to add_chain().
>
>> Did you mean this?
>> https://github.com/openjdk/jdk/blob/99bc98357dab78bef2cce7a10c98d13d1e5730e3/src/hotspot/share/jfr/leakprofiler/chains/rootSetClosure.cpp#L87-L97
>
> Yes, we start with classes first because that typically makes the most sense to users. The problem with starting with threads is that you can get odd chains if one thread holds references to other threads. Generally, stacks are harder to understand compared to an inner class (e.g. a listener) holding a reference to the outer class.
>
>> I can reverse the order in there and thus get the (roughly) reversed order. I already tested that, but refrained from adding it to the patch.
>
> I think we want to process roots in the same order as in BFS.
>
>> What do you think about printing the actual (first, first and second ...) objects that are referenced by the roots? I think that would often help a lot. It helped me a lot in understanding what happened.
>
> I have used 'jfr print --events OldObjectSample' when debugging issues. At one point, I had a JavaFX GUI that allowed me to explore chains, but I think JMC can do that today. I have not studied non-leaks that much.I don't have a strong opinion on what to show in trace mode, as long as it doesn't slow down product builds.
@egahlin @roberttoyonaga I put this back into draft and propose a minimally invasive patch that "works well enough"; either until I get the non-recursiveness bulletproof for all corner cases or until we get rid of DFS.
Reason: I keep running into very tricky corner cases in the BFS+DFS mixed mode at the handoff point to DFS caused by:
- The fact that the new non-recursive DFS can stop at any point (in theory) if the DFS probe stack runs full; that creates a new stop condition that did not exist before. Previously, DFS would stop at a maximum stack depth >>> 1. Now, in theory, DFS can stop at recursion level 1. That means we cannot guarantee that the oop handed down to DFS is fully iterated.
- That causes problems since the BFS->DFS handoff oop is iterated twice. Once up in BFS, once in DFS. If the handoff oop is a large object array and it does not get fully iterated by DFS, BFS will reenter DFS with the same oop. This causes the runtime to scale quadratically with the array size (similar to https://bugs.openjdk.org/browse/JDK-8373490).
All of this is solvable; no solution appeals to me, and they all make the patch even more complex. In light of the impending DFS removal, I will pull this back to draft and propose a simpler patch (the main concern is to get a patch out the door quickly for our customer).
-------------
PR Comment: https://git.openjdk.org/jdk/pull/29382#issuecomment-3907529456
More information about the hotspot-jfr-dev
mailing list