RFR: 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays
I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions: 1) We have large object arrays on the heap 2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57...). This, in turn, is more likely to happen with large object arrays. Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two. The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by decreasing the size of the BFS edge queue in the code. In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results: - Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds) The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever. Examining the problem more closely, I see: - BFS search starts iterating over the object array. It will find that its edge queue is too small, and will drop down to DFS for every single object element. That in itself is not the problem. - It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size. The reason for this is a missing mark check at the border between BFS and DFS. Tests: - I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs). I also manually verified the resulting JFR file and confirmed that I see the gc roots listed for the array elements. - I ran JFR jtreg tests manually ------------- Commit messages: - fix - start Changes: https://git.openjdk.org/jdk/pull/28781/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28781&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373490 Stats: 208 lines in 2 files changed: 202 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28781/head:pull/28781 PR: https://git.openjdk.org/jdk/pull/28781
On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe@openjdk.org> wrote:
I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
1) We have large object arrays on the heap
2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57...). This, in turn, is more likely to happen with large object arrays.
Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
- Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
Examining the problem more closely, I see:
- BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
- It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
The reason for this is a missing mark check at the border between BFS and DFS.
Tests:
- I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...
Looks reasonable. I will run some tests before approving. (It’s the end of the year, and I have vacation time I must take, so my availability for reviews is limited.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3645483611
On Fri, 12 Dec 2025 08:34:47 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
Looks reasonable. I will run some tests before approving.
(It’s the end of the year, and I have vacation time I must take, so my availability for reviews is limited.)
Same here :-) Hope you have a nice vacation, btw! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3645590241
On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe@openjdk.org> wrote:
I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
1) We have large object arrays on the heap
2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57...). This, in turn, is more likely to happen with large object arrays.
Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
- Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
Examining the problem more closely, I see:
- BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
- It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
The reason for this is a missing mark check at the border between BFS and DFS.
Tests:
- I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...
Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28781#pullrequestreview-3571992566
On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe@openjdk.org> wrote:
I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
1) We have large object arrays on the heap
2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57...). This, in turn, is more likely to happen with large object arrays.
Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
- Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
Examining the problem more closely, I see:
- BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
- It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
The reason for this is a missing mark check at the border between BFS and DFS.
Tests:
- I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...
A better title for the bug might be "JFR: path-to-gc-roots=true is very slow for large object arrays". The leak profiler name is not something we have used externally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3647430575
On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe@openjdk.org> wrote:
I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
1) We have large object arrays on the heap
2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57...). This, in turn, is more likely to happen with large object arrays.
Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
- Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
Examining the problem more closely, I see:
- BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
- It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
The reason for this is a missing mark check at the border between BFS and DFS.
Tests:
- I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...
This fix looks good to me ------------- Marked as reviewed by roberttoyonaga@github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/28781#pullrequestreview-3573204588
On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe@openjdk.org> wrote:
I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
1) We have large object arrays on the heap
2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57...). This, in turn, is more likely to happen with large object arrays.
Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
- Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
Examining the problem more closely, I see:
- BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
- It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
The reason for this is a missing mark check at the border between BFS and DFS.
Tests:
- I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...
Is there a way to add a timeout to the test for the failure case? Time outs in unit tests are not completely reliable, but maybe something generous like 10s might be reasonable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3647841771
On Fri, 12 Dec 2025 19:23:34 GMT, Robert Toyonaga <duke@openjdk.org> wrote:
Is there a way to add a timeout to the test for the failure case? Time outs in unit tests are not completely reliable, but maybe something generous like 10s might be reasonable.
I don't think we should have a timeout. Timeouts typically only lead to false positives, when you run on some old hardware, during stress, using some other option etc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3648179388
On Fri, 12 Dec 2025 21:20:56 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
Is there a way to add a timeout to the test for the failure case? Time outs in unit tests are not completely reliable, but maybe something generous like 10s might be reasonable.
Adding to what Eric wrote, there is the timeout of jtreg itself, of course. By default 120secs. But folks can increase or decrease that at command line level. Many thanks for the speedy reviews, @egahlin and @roberttoyonaga ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3650778670 PR Comment: https://git.openjdk.org/jdk/pull/28781#issuecomment-3650779585
On Fri, 12 Dec 2025 07:36:57 GMT, Thomas Stuefe <stuefe@openjdk.org> wrote:
I see a steep performance decrease (as in: path-to-gc-root can hang for minutes or hours) with the JFR Leak Profiler under the following conditions:
1) We have large object arrays on the heap
2) We start with BFS, but at some point, we fallback to DFS. This happens when the BFS edge queue becomes too large (see https://github.com/tstuefe/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57...). This, in turn, is more likely to happen with large object arrays.
Both BFS and DFS are very fast on their own; when one enforces DFS (via the SkipBFS parameter), all is well. If BFS runs without DFS, all is well, too. The problem only occurs when mixing these two.
The problem is more likely with large object arrays on small heaps, since the size of the BFS edge queue is tied to 1/20 of the heap size. The problem can be provoked more easily by artificially decreasing the size of the BFS edge queue in the code.
In my tests (a modified variant of `TestJcmdDumpPathToGCRoots.java`), with a BFS edge queue size artificially reduced to 4 MB, I could reproduce the problem with an object array of 200k - 300k elements, with the following results:
- Object Array size of 200,000 elements: BFS only, all is well (~5 seconds) - Object Array size of 250,000 elements: BFS/DFS mixed, (~200 seconds) - Object Array size of 300,000 elements: BFS/DFS mixed, (~650 seconds)
The run time is the square of the array size, see explanation below. With the default edge queue size of 32MB, we only hit that DFS fallback path when processing arrays of >2mio, and for arrays as big as these, the search basically hangs forever.
Examining the problem more closely, I see:
- BFS search starts iterating over the object array. That many oops will saturate the queue. It will drop down to DFS for every single object element. That in itself is not the problem.
- It will then continue to process edges, and re-process the array oop over and over again: passing its oop down to `DFSClosure::find_leaks_from_edge` for every element inside this array. The effect is that, for an array of size N, we process that array oop N times. So the runtime will be O^2 with respect to the array size.
The reason for this is a missing mark check at the border between BFS and DFS.
Tests:
- I provided a regression test that provokes the pathological behavior before the patch and is very quick to finish with the patch (TestJcmdDumpPathToGCRootsBFSDFS.java#bfsdfs...
This pull request has now been integrated. Changeset: 99f90bef Author: Thomas Stuefe <stuefe@openjdk.org> URL: https://git.openjdk.org/jdk/commit/99f90befafe9476de17e416d45a9875569171935 Stats: 208 lines in 2 files changed: 202 ins; 6 del; 0 mod 8373490: JFR Leak Profiler: path-to-gc-root very slow for large object arrays Reviewed-by: egahlin ------------- PR: https://git.openjdk.org/jdk/pull/28781
participants (3)
-
Erik Gahlin
-
Robert Toyonaga
-
Thomas Stuefe