A hard-to-reproduce EPollSelector bug...
David Lloyd
david.lloyd at redhat.com
Thu Mar 15 15:30:30 UTC 2018
On Thu, Mar 15, 2018 at 9:43 AM, David Lloyd <david.lloyd at redhat.com> wrote:
> On Thu, Mar 15, 2018 at 9:17 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>> On 15/03/2018 14:02, David Lloyd wrote:
>>>
>>> This talk of Selectors has indirectly reminded me of a problem that we
>>> encounter, particularly in testing, which I think is a bug (or maybe
>>> just a surprise) in the EPollSelector implementation on Linux.
>>>
>>> The symptom of the problem is that a ServerSocketChannel is closed,
>>> yet a subsequent bind operation which definitely happens-after the
>>> close on the same socket address can fail with EADDRINUSE (even if
>>> SO_REUSEADDR is used).
>>>
>> Do you have a reproducer? If so, can you run it on the latest JDK 11 build
>> where the bind method is now correctly synchronized on the channel state.
>
> I might be able to write one. I'll give it a try anyway...
Well my naive hope that I could create a quick & dirty fix has been
dashed so far. But, looking at the original bug report that sent me
down this chase, I see that it was perhaps not limited to just EPoll;
KQueue on Mac also suffers (or suffered) from a similar problem, and I
understand it happened on Windows as well. So my hypothesis that it
is due to epoll weirdness is probably an "overthink" of the problem;
maybe it is in fact just a question of ordering the bind correctly as
you say. The bug report is publicly viewable and can be found at [1]
(the stack traces are the interesting part).
[1] https://issues.jboss.org/browse/WFCORE-3302
--
- DML
More information about the nio-dev
mailing list