<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>


</head>


<body dir="ltr">


<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


Hi,</div>


<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


<br>


</div>


<div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">Could the problem have occurred because the ForkJoinPool got an OOME when it tried to


 allocate a ForkJoinWorkerThread?<br>


<br>


To check for that, if you're using the commonPool(), you might be able to add a custom ForkJoinWorkerThreadFactory via passing in -Djava.util.concurrent.ForkJoinPool.common.threadFactory=<insert fqcn of custom factory here> and implement newThread() such that


 you try-catch OOME and log it from there.</span></div>


<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


<br>


</div>


<div id="Signature">


<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


Cheers,<br>


√</div>


<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


<br>


</div>


<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


<b><br>


</b></div>


<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


<b>Viktor Klang</b></div>


<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">


Software Architect, Java Platform Group<br>


Oracle</div>


</div>


<div id="appendonsend"></div>


<hr style="display:inline-block;width:98%" tabindex="-1">


<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> core-libs-dev <core-libs-dev-retn@openjdk.org> on behalf of Xiao Yu <cutefish.yx@gmail.com><br>


<b>Sent:</b> Saturday, 3 February 2024 19:54<br>


<b>To:</b> Jaikiran Pai <jai.forums2013@gmail.com><br>


<b>Cc:</b> core-libs-dev@openjdk.org <core-libs-dev@openjdk.org><br>


<b>Subject:</b> Re: The common ForkJoinPool does not have any ForkJoinWorkerThread while tasks are submitted to the queue</font>


<div> </div>


</div>


<div>


<div dir="ltr">


<div>Hi Jaikiran,</div>


<div><br>


</div>


<div>Thanks a lot for replying.<br>


<br>


Our application is a client that communicates to the server for<br>


request/response. The client creates a secure (TLS) connection to the server,<br>


that is, on top of the SocketChannel, we implement a Wrapper class called<br>


SSLDataChannel for reading and writing. The SSLDataChannel uses the<br>


javax.net.ssl.SSLEngine. Before any read and write can happen, we need to do<br>


SSL handshakes by calling methods in SSLEngine. One of the methods is<br>


SSLEngine#getDelegatedTask(). The returned task needs to be executed before the<br>


handshake can proceed. After the task is done, we need to continue processing<br>


read and write events on the connection. The connection read and write events<br>


are all handled by a class called NioEndpointHandler. One requirement for our<br>


client is that it supports an asynchronous API and therefore the whole stack<br>


must all implement non-blocking methods. The tasks from the SSLEngine could<br>


take a long time and we do not want them to block our other connection events,<br>


and this is when the ForkJoinPool is used. We run the SSL tasks in the<br>


ForkJoinPool and after the task is done we arrange to run the<br>


NioEndpointHandler callbacks to proceed with the read and write events. The<br>


much simplified code looks somewhat like the following.<br>


<br>


```<br>


class NioEndpointHandler {<br>


<br>


    /** The ssl channel */<br>


    private final SSLDataChannel sslDataChannel;<br>


    /** The runnable to execute to handle read after ssl tasks is done. */<br>


    private final Runnable handleReadAfterSSLTask = () -> onRead();<br>


    /** The handler state. */<br>


    State state;<br>


<br>


    /** Executes the SSL tasks until no task to run, then run the callback. */<br>


    private void executeSSLTask(ExecutorService executor, Runnable callback) {<br>


        executor.submit(() -> {<br>


            Runnable task;<br>


            while ((task = sslDataChannel.getSSLEngine().getDelegatedTask()) != null) {<br>


                task.run();<br>


            }<br>


            try {<br>


                callback.run();<br>


            } catch (Throwable t) {<br>


</div>


<div>                /* logging the exception. */<br>


</div>


<div>            }<br>


        });<br>


    }<br>


<br>


    /** Handle a read event. */<br>


    private void onRead() {<br>


        if (sslDataChannel.needsHandshake()) {<br>


            /* do handshake */<br>


<br>


            /* One of the handshake step is to check if there is any SSL task to run. */<br>


            if (sslDataChannel.needExecuteTask()) {<br>


                executeSSLTask(ForkJoinPool.commonPool(), handleReadAfterSSLTask);<br>


            }<br>


        }<br>


    }<br>


<br>


    private void terminate() {<br>


        state = TERMINATED;<br>


<br>


        /* Other clean up tasks, however, tasks submitted to the ForkJoinPool are not cancelled. */<br>


    }<br>


}<br>


```<br>


<br>


> What are these handlers? Are they classes which implement Runnable or<br>


> are they something else? What does termination of handler mean in this<br>


> context? Do you use any java.util.concurrent.* APIs to "cancel" such<br>


> terminated handlers?<br>


<br>


The much simplified handler code please see above.<br>


<br>


The tasks submitted to the ForkJoinPool queue are Runnables that are fields to<br>


the NioEndpointHandler. What we have observed is that there are a lot of tasks<br>


in the fork join pool that have a reference to the lambda inside<br>


NioEndpointHandler#executeSSLTask which eventually have a reference to the<br>


NioEndpointHandler. Those NioEndpointHandler are in the TERMINATED state. The<br>


only reference to those NioEndpointHandler are these tasks or otherwise they<br>


can be garbage collected after the termination cleans up all the other<br>


references.</div>


<div><br>


</div>


<div>Termination of the handler means those connections are at the end of their life<br>


cycle. We clean up things such as signal end of life cycle for all the<br>


associated request/response pairs and closing the SSLDataChannel, etc.<br>


<br>


No, we have not use the cancel method to cancel the submitted tasks. I agree<br>


that this is an oversight and it would be cleaner to cancel them. However, my<br>


current theory is that this is not the root cause. From my understanding of the<br>


code, the cancel method only changes the state of the task. It does not remove<br>


the task from the queue of the ForkJoinPool. Therefore, those tasks, even if<br>


got cancelled, would still stay in the queue preventing the terminated<br>


NioEndpointHandler from being garbage collected. Currently, I am strongly<br>


biased to my own theory that somehow there is no ForkJoinPool thread that<br>


polling tasks out of the queue and I am trying to use the ctl field in the<br>


ForkJoinPool as the evidence to backup my theory. I am wondering if I am making<br>


some mistake with my theory.<br>


<br>


> Finally, what does the OutOfMemoryError exception stacktrace look like<br>


> and what is the JVM parameters (heap size for example) used to launch<br>


> this application?<br>


<br>


Our clients creates about 155 threads and quite a lot of them have OOME on<br>


their stack. I am not quite sure how to reply to this question. Going through<br>


the stack traces, I do not find anything very suspicious. They are just<br>


exercising their most frequent code path: some I/O threads waiting for I/O<br>


events and some execution threads waiting for more work to do, etc.<br>


<br>


It is worth mentioning that there is no ForkJoinPoolWorkerThread stacks in the<br>


thread dump from the heap dump. From my understanding, the only time when there<br>


is no such thread is when there is no tasks to run. But there are quite a lot<br>


of tasks in the queue.<br>


<br>


Here are our JVM arguments:<br>


<br>


```<br>


-Xms1G<br>


-Xmx1G<br>


-Djava.util.logging.config.file=/var/lib/andc/config/params/sender.logging.properties<br>


-Djavax.net.ssl.trustStore=/var/lib/andc/wallet/client.trust<br>


-Doracle.kv.security=/var/lib/andc/config/security/login.properties<br>


-Doci.javasdk.extra.stream.logs.enabled=false<br>


-XX:G1HeapRegionSize=32m<br>


-XX:+DisableExplicitGC<br>


-Xlog:all=warning,gc*=info,safepoint=info:file=/var/lib/andc/log/sender/sender.gc:utctime:filecount=10,filesize=10000000<br>


-XX:+HeapDumpOnOutOfMemoryError<br>


-XX:HeapDumpPath=/var/lib/andc/log/sender/<br>


```<br>


<br>


We have creation and termination timestamps in the NioEndpointHandler object.<br>


From what I can see in the heap dump, the SSL tasks in the ForkJoinPool are<br>


associated with NioEndpointHandler that are created at an interval on the<br>


magnitude of seconds (retry attempt with second-magnitude backoff). Each<br>


NioEndpointHandler are terminated after a fixed 5-second timeout due to unable<br>


to connect. The time span for those NioEndpointHandler is about 2 hours. This<br>


creates<br>


```<br>


2 hours * 3600 seconds / hour * 1 NioEndpointHandler / second  * 1 SSLDataChannel / NioEndpointHandler * 65K bytes / SSLDataChannel ~= 468M bytes.<br>


```<br>


With 1G heap size, this eventually caused OOME.<br>


<br>


We are adding fixes so that the SSL tasks would not preventing the<br>


NioEndpointHandler from being garbage collected. However, the root cause is<br>


still a mystery and I am wondering if I am on the right tracker to figure it<br>


out.<br>


<br>


Thanks a lot for your time and patience.<br>


<br>


<br>


</div>


<div><br>


</div>


<div>


<div>


<div dir="ltr" class="x_gmail_signature" data-smartmail="gmail_signature">Xiao Yu</div>


</div>


<br>


</div>


</div>


<br>


<div class="x_gmail_quote">


<div dir="ltr" class="x_gmail_attr">On Fri, Feb 2, 2024 at 5:35 AM Jaikiran Pai <<a href="mailto:jai.forums2013@gmail.com">jai.forums2013@gmail.com</a>> wrote:<br>


</div>


<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">


Hello Xiao,<br>


<br>


I don't have enough knowledge of this area to provide any insight into <br>


the issue. However, just to try and get the discussion started, do you <br>


have any sample code of your application which shows how the application <br>


uses the ForkJoinPool? More specifically what APIs do you use in the <br>


application?<br>


<br>


Few other questions inline below.<br>


<br>


On 12/01/24 11:30 am, Xiao Yu wrote:<br>


> ....<br>


> Here is the full background. One of our process experienced an OOME <br>


> and a heap<br>


> dump was obtained. We know there was a concurrent issue of our system <br>


> happening<br>


> on some other machines such that network failure and retries occurred <br>


> in this<br>


> process at the same time. Upon analyzing the heap dump, we observed a <br>


> lot of<br>


> our network connection handlers being frequently created and <br>


> terminated which<br>


> is expected due to the network failure and retry attempts mentioned above.<br>


> However, those terminated handlers are not being GC'ed because of <br>


> there were<br>


> references to tasks submitted to the ForkJoinPool during the connection<br>


> attempts. The tasks stayed in the queue until OOME happened as there is no<br>


> threads to execute them.<br>


<br>


What are these handlers? Are they classes which implement Runnable or <br>


are they something else? What does termination of handler mean in this <br>


context? Do you use any java.util.concurrent.* APIs to "cancel" such <br>


terminated handlers?<br>


<br>


Finally, what does the OutOfMemoryError exception stacktrace look like <br>


and what is the JVM parameters (heap size for example) used to launch <br>


this application?<br>


<br>


-Jaikiran<br>


</blockquote>


</div>


</div>


</body>


</html>