A hard-to-reproduce EPollSelector bug...
David Lloyd
david.lloyd at redhat.com
Thu Mar 15 16:19:07 UTC 2018
On Thu, Mar 15, 2018 at 10:46 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
> On 15/03/2018 15:30, David Lloyd wrote:
>>
>> :
>> Well my naive hope that I could create a quick & dirty fix has been
>> dashed so far. But, looking at the original bug report that sent me
>> down this chase, I see that it was perhaps not limited to just EPoll;
>> KQueue on Mac also suffers (or suffered) from a similar problem, and I
>> understand it happened on Windows as well. So my hypothesis that it
>> is due to epoll weirdness is probably an "overthink" of the problem;
>> maybe it is in fact just a question of ordering the bind correctly as
>> you say. The bug report is publicly viewable and can be found at [1]
>> (the stack traces are the interesting part).
>
> I think all it needs is one thread calling bind at around the same time that
> another thread attempts to closes the channel. It doesn't need any Selectors
> in the picture.
This is definitely not the expression of the problem; in our case
we're calling close and then bind from the _same_ thread (or otherwise
effectively happens-after), and the bind fails because the close
didn't "kill" the channel for whatever reason - instead, the selector
is what finally does it, per this kind of stack trace:
"XNIO-1 Accept at 1562" daemon prio=5 tid=0xf nid=NA runnable
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.ServerSocketChannelImpl.kill(ServerSocketChannelImpl.java:307)
- locked <0xc0d> (a java.lang.Object)
at sun.nio.ch.KQueueSelectorImpl.implDereg(KQueueSelectorImpl.java:229)
at sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:149)
- locked <0xc38> (a java.util.HashSet)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:107)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0xc2b> (a sun.nio.ch.KQueueSelectorImpl)
- locked <0xc39> (a java.util.Collections$UnmodifiableSet)
- locked <0xc3a> (a sun.nio.ch.Util$2)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:519)
But, I haven't yet figured out why it's happening in that order;
reproducing has failed so far so I'm going to try a different tactic
and see if I can statically sequence it out and find a scenario where
this happens.
--
- DML
More information about the nio-dev
mailing list