A hard-to-reproduce EPollSelector bug...

Fri Mar 16 15:03:21 UTC 2018

On Fri, Mar 16, 2018 at 9:31 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
> On 15/03/2018 21:01, David Lloyd wrote:
>>
>> :
>> OK this is my understanding of what's happening.  I've chosen
>> EPollSelectorImpl because I understand it most, though as I said we've
>> seen this on other OSes: Mac, Windows (!), and Linux for certain.  I
>> can't be 100% sure it's exactly the same problem but even if it isn't,
>> the fix (manually awakening/selecting all associated selectors) solves
>> the problem in every case.  Whether that's due to a coincidence of
>> scheduling or some other secondary subtlety is hard to know.
>
> ServerSocketChannel.bind didn't historically synchronize correctly on
> channel state and so could fail in interesting ways when called at around
> the same time that it is closed. Specifically it could attempt to bind after
> the dup2, or worse, after the file descriptor has been closed and recycled.
> If there were a T3 in the picture that was created a listener at around the
> same time then it might be possible to explain the EADDRINUSE.

OK that's interesting.  This most often appears in tests so there is a
lot of rapid setup/teardown happening with lots of threads around, so
maybe there's a chance that we're binding some wrong FD in these
cases.  I might see if I can make a reproducer with that in mind.

-- 
- DML