all EventHandlerTasks in EPollPort waiting on queue

Jeremiah Ness jness at proofpoint.com
Wed Jan 11 22:11:58 UTC 2017


(Sorry is this is the wrong place to discuss this matter. If you know of a
better place, please let me know.)

I suspect there is a bug in EPollPort.java (and KQueuePort.java) and am looking
for a more informed opinion.

Please reference the source code below from
src//solaris/classes/sun/nio/ch/EPollPort.java.

On line 244 below the poll() method attempts to insert the NEED_TO_POLL event
into the queue. The queue is a fixed size ArrayBlockingQueue of size
MAX_EPOLL_EVENTS. On line 244, if the queue is full, then the event silently
fails to be inserted. The NEED_TO_POLL event is critical for the operation of
the EPollPort as it is the event which signals one of the threads to poll
again.

I believe this is what is causing all of my application’s completion threads to
get stuck.

The queue can become full if:
- the inner loop (lines 203 -> 235) is processing MAX_EPOLL_EVENTS events (512
  events)
- the last inner loop iteration gets a null channel on line 223
- in this case we loop around to line 193 instead of returning from the method
- we get 2 more events when calling epollWait again on line 194
- these events fill the queue to its maximum size
- we return from the method however fail to insert NEED_TO_POLL in the queue

Am I perhaps missing something?

Thanks for your thoughts.


191         private Event poll() throws IOException {
192             try {
193                 for (;;) {
194                     int n = epollWait(epfd, address, MAX_EPOLL_EVENTS);
195                     /*
196                      * 'n' events have been read. Here we map them to their
197                      * corresponding channel in batch and queue n-1 so that
198                      * they can be handled by other handler threads. The last
199                      * event is handled by this thread (and so is not queued).
200                      */
201                     fdToChannelLock.readLock().lock();
202                     try {
203                         while (n-- > 0) {
204                             long eventAddress = getEvent(address, n);
205                             int fd = getDescriptor(eventAddress);
206
207                             // wakeup
208                             if (fd == sp[0]) {
209                                 if (wakeupCount.decrementAndGet() == 0) {
210                                     // no more wakeups so drain pipe
211                                     drain1(sp[0]);
212                                 }
213
214                                 // queue special event if there are more events
215                                 // to handle.
216                                 if (n > 0) {
217                                     queue.offer(EXECUTE_TASK_OR_SHUTDOWN);
218                                     continue;
219                                 }
220                                 return EXECUTE_TASK_OR_SHUTDOWN;
221                             }
222
223                             PollableChannel channel = fdToChannel.get(fd);
224                             if (channel != null) {
225                                 int events = getEvents(eventAddress);
226                                 Event ev = new Event(channel, events);
227
228                                 // n-1 events are queued; This thread handles
229                                 // the last one except for the wakeup
230                                 if (n > 0) {
231                                     queue.offer(ev);
232                                 } else {
233                                     return ev;
234                                 }
235                             }
236                         }
237                     } finally {
238                         fdToChannelLock.readLock().unlock();
239                     }
240                 }
241             } finally {
242                 // to ensure that some thread will poll when all events have
243                 // been consumed
244                 queue.offer(NEED_TO_POLL);
245             }


On 1/9/17, 12:16 PM, "nio-dev on behalf of Jeremiah Ness" <nio-dev-bounces at openjdk.java.net on behalf of jness at proofpoint.com> wrote:

>Using jre8u112 on CentOs7.
>
>I have an application which uses AsynchronousChannelGroup.withFixedThreadPool
>with 5 threads. When the application in under high load creating 1000s of
>AsynchronousSocketChannels per second, simultaneously closing many
>AsynchronousSocketChannels, on occasion all of the EventHandlerTasks in
>EPollPort become stuck waiting for events on the EPollPort.queue. All the
>threads have the following stack:
>
>"CompletionThread-5" #33 daemon prio=5 os_prio=0 tid=0x00007f0a2829f800 nid=0x5c41 waiting on condition [0x00007f0a11ee1000]
>   java.lang.Thread.State: WAITING (parking)
>        at sun.misc.Unsafe.park(Native Method)
>        - parking to wait for  <0x00000006c4fe8868> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>        at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
>        at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:262)
>        at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
>
>They all appear to be waiting for the queue to be filled (calling
>BlockingQueue.take) however there are no threads calling the native method
>epollWait to get more IO events. This condition persists until the application
>is restarted.
>
>By examining the EPollPort class I have the understanding that one of the
>threads should be polling. It this correct?
>
>By examining the EPollPort.EventHandlerTask.poll method I am wondering if there
>is a code path that would allow all threads to be waiting on the queue. In
>particular is the following possible within EPollPort.EventHandlerTask.poll:
>
>1. The native method epollWait returns 512 events.
>2. Before the fdToChannelLock.readLock is acquired the channel associated with
>   the 512th event is closed.
>3. The fixed size EPollPort.queue is filled to size 511.
>4. The 512th event is processed, however because it has been closed it is no
>   longer in the fdToChannel map
>5. The thread loops around the for(;;) loop and calls epollWait again
>6. epollWait returns 2 more events
>7. The fixed size queue is filled to its maximum capacity of 512.
>8. The finally queue.offer(NEED_TO_POLL) call fails because the queue is full.
>
>If this occurs would all the EventHandlerTasks then eventually be stuck as per
>the stack trace above?
>
>Thanks,
>Jeremiah Ness
>
>
>
>



More information about the nio-dev mailing list