From kineolyan at proton.me Sat Dec 6 14:30:04 2025 From: kineolyan at proton.me (Olivier Peyrusse) Date: Sat, 06 Dec 2025 14:30:04 +0000 Subject: "Memory leak" caused by WorkQueue#topLevelExec Message-ID: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me> Hello community, Sorry if this is the wrong place to discuss internal classes such as the ForkJoinPool. If so, please, excuse me and point me in the right direction. At my company, we have experienced an unfortunate memory leak because one of our CountedCompleter was retaining a large object and the task was not released to the GC (I will give more details below but will first focus on the FJP code causing the issue). When running tasks, the FJP ends up calling [WorkQueue#topLevelExec](https://github.com/openjdk/jdk/blob/c419dda4e99c3b72fbee95b93159db2e23b994b6/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1448-L1453), which is implemented as follow: final void topLevelExec(ForkJoinTask task, int fifo) { while (task != null) { task.doExec(); task = nextLocalTask(fifo); } } We can see that it starts from a top-level task task?, executes it, and looks for the next task to execute before repeating this loop. This means that, as long as we find a task through nextLocalTask??, we do not exit this method and the caller of topLevelExec? retains in its stack a reference to the first executed task - like [here](https://github.com/openjdk/jdk/blob/c419dda4e99c3b72fbee95b93159db2e23b994b6/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#L1992-L2019). This acts as a path from the GC root, preventing the garbage collection of the task. So even if a CountedCompleter did complete its exec / tryComplete / etc, the framework will keep the object alive. Could the code be changed to avoid this issue? I am willing to do the work, as well as come up with a test case reproducing the issue if it is deemed needed. In our case, we were in the unfortunate situation where our counted completer was holding an element which happened to be a sort of head of a dynamic sort of linked queue. By retaining it, the rest of the growing linked queue was also retained in memory, leading to the memory leak. Obvious fixes are possible in our code, by ensuring that we nullify such elements when our operations complete, and more ideas. But this means that we have to be constantly careful about the fields we pass to the task, what is captured if we give lambdas, etc. If the whole ForkJoinPool could also be improved to avoid such problems, it would be an additional safety. Thank you for reading the mail Cheers Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From viktor.klang at oracle.com Sat Dec 6 15:27:12 2025 From: viktor.klang at oracle.com (Viktor Klang) Date: Sat, 6 Dec 2025 16:27:12 +0100 Subject: "Memory leak" caused by WorkQueue#topLevelExec In-Reply-To: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me> References: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me> Message-ID: Hi, Have you validated that your proposed fix solves the problem? The reason I ask is because FJP is extremely performance sensitive where instruction placement can have massive performance implications. It might be that manually inlining topLevelExec into its caller is performance neutral, but I want to make sure that we're looking at a problem which is practically solvable by making changes in FJP itself. The reason I ask is because CountedCompleters tend to have live dependency chains both in a tree-like manner and a dependency-chain like manner. In general, given the structure you tend to want to null out things you don't want to be kept any longer, to minimize the time that they are kept alive. As an example, see: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java#L695-L705 On 2025-12-06 15:30, Olivier Peyrusse wrote: > Could the code be changed to avoid this issue? I am willing to do the > work, as well as come up with a test case reproducing the issue if it > is deemed needed. -- Cheers, ? Viktor Klang Software Architect, Java Platform Group Oracle -------------- next part -------------- An HTML attachment was scrubbed... URL: From dl at cs.oswego.edu Sat Dec 6 15:38:55 2025 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 6 Dec 2025 10:38:55 -0500 Subject: "Memory leak" caused by WorkQueue#topLevelExec In-Reply-To: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me> References: <6ISsfBspHMRXGWyY5Ny7uRplatv-hLfKNRsISffFMjWwzBygAgyGh1kUPaVwfd7ckUukAcU5RHa27mU2BMTtBqq_YjE8tpu9BnegpPEQFjU=@proton.me> Message-ID: <50de0a1c-7e6a-4c09-b21d-a392f972d241@cs.oswego.edu> On 12/6/25 09:30, Olivier Peyrusse wrote: > Hello community, > > Sorry if this is the wrong place to discuss internal classes such as > the ForkJoinPool. If so, please, excuse me and point me in the right > direction. > > At my company, we have experienced an unfortunate memory leak because > one of our CountedCompleter was retaining a large object and the task > was not released to the GC (I will give more details below but will > first focus on the FJP code causing the issue). > > When running tasks, the FJP ends up calling WorkQueue#topLevelExec > , > which is implemented as follow: > > ? ? ? final void topLevelExec(ForkJoinTask task, int fifo) { > ? ? ? ? ? while (task != null) { > ? ? ? ? ? ? ? task.doExec(); > ? ? ? ? ? ? ? task = nextLocalTask(fifo); > ? ? ? ? ? } > ? ? ? } > > We can see that it starts from a top-level task |task|?, executes it, > and looks for the next task to execute before repeating this loop. > This means that, as long as we find a task through > |nextLocalTask|?||?, we do not exit this method and the caller of > |topLevelExec|? retains in its stack a reference to the first executed > task - like here > . > This acts as a path from the GC root, preventing the garbage > collection of the task. The issue is not in that code, but the calling sequence: A ref is retained mainly for the sake of a stack trace. The only way to (only sometimes) avoid this would be to manually inline the method, which leads to different compilation/execution issues, which leads to other tradeoffs impacting other usages. But it's worth considering. Thanks for the report. -Doug > So even if a CountedCompleter did complete its exec / tryComplete / > etc, the framework will keep the object alive. > Could the code be changed to avoid this issue? I am willing to do the > work, as well as come up with a test case reproducing the issue if it > is deemed needed. > > In our case, we were in the unfortunate situation where our counted > completer was holding an element which happened to be a sort of head > of a dynamic sort of linked queue. By retaining it, the rest of the > growing linked queue was also retained in memory, leading to the > memory leak. > Obvious fixes are possible in our code, by ensuring that we nullify > such elements when our operations complete, and more ideas. But this > means that we have to be constantly careful about the fields we pass > to the task, what is captured if we give lambdas, etc. If the whole > ForkJoinPool could also be improved to avoid such problems, it would > be an additional safety. > > Thank you for reading the mail > Cheers > > Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: