The common ForkJoinPool does not have any ForkJoinWorkerThread while tasks are submitted to the queue

Fri Jan 12 06:00:33 UTC 2024

Hello Core Libs Dev Team,

This is my first time posting here. Please let me know if there are any
conventions I should follow.

I am writing in the hope to get some help to understand a state of the
ForkJoinPool that we observed. We have encountered a case where there were
tasks
inside the queue of the ForkJoinPool, but there was no ForkJoinWorkerThread
to
execute them. I am wondering if such state is expected of the ForkJoinPool
and
hopefully I could get some suggestion to understand how our system entered
such
state.

Here is the full background. One of our process experienced an OOME and a
heap
dump was obtained. We know there was a concurrent issue of our system
happening
on some other machines such that network failure and retries occurred in
this
process at the same time. Upon analyzing the heap dump, we observed a lot of
our network connection handlers being frequently created and terminated
which
is expected due to the network failure and retry attempts mentioned above.
However, those terminated handlers are not being GC'ed because of there were
references to tasks submitted to the ForkJoinPool during the connection
attempts. The tasks stayed in the queue until OOME happened as there is no
threads to execute them.

>From both the heap dump and the thread dump, it seems there was no
ForkJoinWorkerThread which, from my understanding, is the worker thread
object
to execute those tasks. Looking at the ForkJoinPool, I noticed that the ctl
field was 9223372032559808512 or 0x7fff ffff 00000000. Is this a valid ctl
state? From the code, this ctl field (together with mode = 1) seems to mean
a released thread count of 32768 and a total thread count of 0 which seems
weird to me because with my current understanding, the total thread count
should be larger than the released thread count. This also seems to cause
the
signalWork method not creating any new thread, since ctl is positive which
seems to reflect the fact that new threads are not being created while there
were tasks in the queue. Perhaps there are some exceptions that we should
catch
but did not pay attention to which caused this state?

Currently, we do not know how to reproduce this issue. Sorry about that. The
heap dump is about 1G in size and therefore, it is not very convenient to
attach in the email, but here are some information I collected from the heap
dump.

Summary info:

``````
System : Linux(5.4.17-2136.325.5.1.el7uek.x86_64)
Architecture: amd64 64bit
Java Version: 17.0.4.1 2022-08-18 LTS
Java Name: Java HotSpot(TM) 64-Bit Server VM (17.0.4.1+1-LTS-2, mixed mode,
sharing)
Java Vendor: Oracle Corporation
``````

ForkJoinPool#common states:

``````
ctl: 9223372032559808512
saturate: null
ueh: null
factory:
java.util.concurrent.ForkJoinPool$DefaultCommonPoolForkJoinWorkerThreadFactory
workerNamePrefix: null
termination: null
registrationLock: java.util.concurrent.locks.ReentrantLock
queues: java.util.concurrent.ForkJoinPool$WorkQueue[]
mode: 1
bounds: 16777216
threadIds: 32768
scanRover: 0
stealCount: 10912
keepAlive: 60000
``````

ForkJoinPool#common#queue[0] states:

``````
nsteals: 0
source: 0
top: 32304
owner: null
array: java.util.concurrent.ForkJoinTask[]
base: 10912
config: 2027939556
stackPred: 0
phase: -1
``````

The thread dump is attached.

Please let me know if you need anything else.

Xiao Yu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240111/09f04def/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: thread.dump
Type: application/octet-stream
Size: 227080 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240111/09f04def/thread-0001.dump>