A hard-to-reproduce EPollSelector bug...

David Lloyd david.lloyd at redhat.com
Thu Mar 15 15:30:30 UTC 2018


On Thu, Mar 15, 2018 at 9:43 AM, David Lloyd <david.lloyd at redhat.com> wrote:
> On Thu, Mar 15, 2018 at 9:17 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>> On 15/03/2018 14:02, David Lloyd wrote:
>>>
>>> This talk of Selectors has indirectly reminded me of a problem that we
>>> encounter, particularly in testing, which I think is a bug (or maybe
>>> just a surprise) in the EPollSelector implementation on Linux.
>>>
>>> The symptom of the problem is that a ServerSocketChannel is closed,
>>> yet a subsequent bind operation which definitely happens-after the
>>> close on the same socket address can fail with EADDRINUSE (even if
>>> SO_REUSEADDR is used).
>>>
>> Do you have a reproducer? If so, can you run it on the latest JDK 11 build
>> where the bind method is now correctly synchronized on the channel state.
>
> I might be able to write one.  I'll give it a try anyway...

Well my naive hope that I could create a quick & dirty fix has been
dashed so far.  But, looking at the original bug report that sent me
down this chase, I see that it was perhaps not limited to just EPoll;
KQueue on Mac also suffers (or suffered) from a similar problem, and I
understand it happened on Windows as well.  So my hypothesis that it
is due to epoll weirdness is probably an "overthink" of the problem;
maybe it is in fact just a question of ordering the bind correctly as
you say.  The bug report is publicly viewable and can be found at [1]
(the stack traces are the interesting part).

[1] https://issues.jboss.org/browse/WFCORE-3302

-- 
- DML


More information about the nio-dev mailing list