all EventHandlerTasks in EPollPort waiting on queue

Jeremiah Ness jness at proofpoint.com
Thu Jan 12 16:03:31 UTC 2017


On 1/12/17, 9:24 AM, "Alan Bateman" <Alan.Bateman at oracle.com> wrote:

>On 11/01/2017 22:11, Jeremiah Ness wrote:
>
>> Please reference the source code below from
>> src//solaris/classes/sun/nio/ch/EPollPort.java.
>>
>> On line 244 below the poll() method attempts to insert the NEED_TO_POLL event
>> into the queue. The queue is a fixed size ArrayBlockingQueue of size
>> MAX_EPOLL_EVENTS. On line 244, if the queue is full, then the event silently
>> fails to be inserted. The NEED_TO_POLL event is critical for the operation of
>> the EPollPort as it is the event which signals one of the threads to poll
>> again.
>>
>There does appear to be an issue here. Can you create a standalone test 
>case to tickle test so that we can include it in a bug report? That 
>would really help get to a regression test to include with the fix.

I do have a test program that can trigger the condition for me. I have attached the source code for PortTest.java. PortTest has successfully demonstrated the issue in the following configurations:

- OSX 10.11.6 with java version "1.8.0_112"
- CentOs7 with kernel 3.10.0-514.2.2.el7.x86_64 and with openjdk version "1.8.0_111"

Note the following caveats:

- PortTest depends on the order in which IO events are returned (because we have to arrange to have the last iteration through the loop have a null channel)
- PortTest uses reflection to artificially grab the writer Port.fdToChannelLock so that the race condition can be triggered

There may be better ways to trigger this race condition. If you have other ideas, I could try them.

By default, the PortTest will attempt to close the channel which it believes will be processed last by the inner loop. That corresponds to a channel that is given an id=0. In this case the program will not exit because the ChannelGroup cannot be shutdown. Using jstack I could see the ChannelGroup thread is stuck waiting on the queue instead of polling for more events. This is the output of the program I see when the race is triggered:

$ java PortTest
…
Jan 12, 2017 10:56:52 AM PortTest$MyCompletionHandler finish
INFO: Closed client channel id=0 completed=false
Jan 12, 2017 10:56:55 AM PortTest test
INFO: TIMED OUT WAITING FOR EVENTS -- completion-thread must be stuck
Jan 12, 2017 10:56:55 AM PortTest test
INFO: finished waiting for batch of completion handlers
Jan 12, 2017 10:56:55 AM PortTest simpleTcpServer
INFO: Closing socket (on server) id=-100
Jan 12, 2017 10:56:58 AM PortTest test
INFO: timed out waiting for termination of ChannelGroup
Jan 12, 2017 10:56:58 AM PortTest test
INFO: completedCount=512 failedCount=1

You can run PortTest and tell it to close a different channel instead of the one with id=0. When I do this, the program shuts down and exits normally.

Thoughts?

Thanks,
Jeremiah Ness

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PortTest.java
Type: application/octet-stream
Size: 12761 bytes
Desc: PortTest.java
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20170112/7dfa9c08/PortTest.java>


More information about the nio-dev mailing list