From kineolyan at proton.me  Sat Dec  6 14:30:04 2025
From: kineolyan at proton.me (Olivier Peyrusse)
Date: Sat, 06 Dec 2025 14:30:04 +0000
Subject: "Memory leak" caused by WorkQueue#topLevelExec
Message-ID: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me>

Hello community,

Sorry if this is the wrong place to discuss internal classes such as the ForkJoinPool. If so, please, excuse me and point me in the right direction.

At my company, we have experienced an unfortunate memory leak because one of our CountedCompleter was retaining a large object and the task was not released to the GC (I will give more details below but will first focus on the FJP code causing the issue).

When running tasks, the FJP ends up calling [WorkQueue#topLevelExec](https://github.com/openjdk/jdk/blob/c419dda4e99c3b72fbee95b93159db2e23b994b6/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1448-L1453), which is implemented as follow:

final void topLevelExec(ForkJoinTask<?> task, int fifo) {
while (task != null) {
task.doExec();
task = nextLocalTask(fifo);
} }

We can see that it starts from a top-level task task?, executes it, and looks for the next task to execute before repeating this loop. This means that, as long as we find a task through nextLocalTask??, we do not exit this method and the caller of topLevelExec? retains in its stack a reference to the first executed task - like [here](https://github.com/openjdk/jdk/blob/c419dda4e99c3b72fbee95b93159db2e23b994b6/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1992-L2019). This acts as a path from the GC root, preventing the garbage collection of the task.
So even if a CountedCompleter did complete its exec / tryComplete / etc, the framework will keep the object alive.
Could the code be changed to avoid this issue? I am willing to do the work, as well as come up with a test case reproducing the issue if it is deemed needed.

In our case, we were in the unfortunate situation where our counted completer was holding an element which happened to be a sort of head of a dynamic sort of linked queue. By retaining it, the rest of the growing linked queue was also retained in memory, leading to the memory leak.
Obvious fixes are possible in our code, by ensuring that we nullify such elements when our operations complete, and more ideas. But this means that we have to be constantly careful about the fields we pass to the task, what is captured if we give lambdas, etc. If the whole ForkJoinPool could also be improved to avoid such problems, it would be an additional safety.

Thank you for reading the mail
Cheers
Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/concurrency-discuss/attachments/20251206/2e08ae70/attachment.htm>

From viktor.klang at oracle.com  Sat Dec  6 15:27:12 2025
From: viktor.klang at oracle.com (Viktor Klang)
Date: Sat, 6 Dec 2025 16:27:12 +0100
Subject: "Memory leak" caused by WorkQueue#topLevelExec
In-Reply-To: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me>
References: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me>
Message-ID: <fb82d53a-6b89-4965-8528-0bfe4239879a@oracle.com>

Hi,

Have you validated that your proposed fix solves the problem? The reason 
I ask is because FJP is extremely performance sensitive where 
instruction placement can have massive performance implications. It 
might be that manually inlining topLevelExec into its caller is 
performance neutral, but I want to make sure that we're looking at a 
problem which is practically solvable by making changes in FJP itself.

The reason I ask is because CountedCompleters tend to have live 
dependency chains both in a tree-like manner and a dependency-chain like 
manner. In general, given the structure you tend to want to null out 
things you don't want to be kept any longer, to minimize the time that 
they are kept alive. As an example, see: 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java#L695-L705

On 2025-12-06 15:30, Olivier Peyrusse wrote:
> Could the code be changed to avoid this issue? I am willing to do the 
> work, as well as come up with a test case reproducing the issue if it 
> is deemed needed.

-- 
Cheers,
?


Viktor Klang
Software Architect, Java Platform Group
Oracle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/concurrency-discuss/attachments/20251206/f2a07b08/attachment-0001.htm>

From dl at cs.oswego.edu  Sat Dec  6 15:38:55 2025
From: dl at cs.oswego.edu (Doug Lea)
Date: Sat, 6 Dec 2025 10:38:55 -0500
Subject: "Memory leak" caused by WorkQueue#topLevelExec
In-Reply-To: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me>
References: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me>
Message-ID: <50de0a1c-7e6a-4c09-b21d-a392f972d241@cs.oswego.edu>


On 12/6/25 09:30, Olivier Peyrusse wrote:
> Hello community,
>
> Sorry if this is the wrong place to discuss internal classes such as 
> the ForkJoinPool. If so, please, excuse me and point me in the right 
> direction.
>
> At my company, we have experienced an unfortunate memory leak because 
> one of our CountedCompleter was retaining a large object and the task 
> was not released to the GC (I will give more details below but will 
> first focus on the FJP code causing the issue).
>
> When running tasks, the FJP ends up calling WorkQueue#topLevelExec 
> <https://github.com/openjdk/jdk/blob/c419dda4e99c3b72fbee95b93159db2e23b994b6/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1448-L1453>, 
> which is implemented as follow:
>
> ? ? ? final void topLevelExec(ForkJoinTask<?> task, int fifo) {
> ? ? ? ? ? while (task != null) {
> ? ? ? ? ? ? ? task.doExec();
> ? ? ? ? ? ? ? task = nextLocalTask(fifo);
> ? ? ? ? ? }
> ? ? ? }
>
> We can see that it starts from a top-level task |task|?, executes it, 
> and looks for the next task to execute before repeating this loop. 
> This means that, as long as we find a task through 
> |nextLocalTask|?||?, we do not exit this method and the caller of 
> |topLevelExec|? retains in its stack a reference to the first executed 
> task - like here 
> <https://github.com/openjdk/jdk/blob/c419dda4e99c3b72fbee95b93159db2e23b994b6/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1992-L2019>. 
> This acts as a path from the GC root, preventing the garbage 
> collection of the task.

The issue is not in that code, but the calling sequence: A ref is 
retained mainly for the sake of a stack trace. The only way to (only 
sometimes) avoid this would be to manually inline the method, which 
leads to different compilation/execution issues, which leads to other 
tradeoffs impacting other usages. But it's worth considering. Thanks for 
the report.

-Doug


> So even if a CountedCompleter did complete its exec / tryComplete / 
> etc, the framework will keep the object alive.
> Could the code be changed to avoid this issue? I am willing to do the 
> work, as well as come up with a test case reproducing the issue if it 
> is deemed needed.
>
> In our case, we were in the unfortunate situation where our counted 
> completer was holding an element which happened to be a sort of head 
> of a dynamic sort of linked queue. By retaining it, the rest of the 
> growing linked queue was also retained in memory, leading to the 
> memory leak.
> Obvious fixes are possible in our code, by ensuring that we nullify 
> such elements when our operations complete, and more ideas. But this 
> means that we have to be constantly careful about the fields we pass 
> to the task, what is captured if we give lambdas, etc. If the whole 
> ForkJoinPool could also be improved to avoid such problems, it would 
> be an additional safety.
>
> Thank you for reading the mail
> Cheers
>
> Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/concurrency-discuss/attachments/20251206/ceeb610e/attachment.htm>