Bug in sun.nio.ch.SolarisEventPort#port_dissociate
David M. Lloyd
david.lloyd at redhat.com
Wed Jun 28 13:37:42 UTC 2017
On 06/16/2017 10:59 AM, David M. Lloyd wrote:
> On 06/16/2017 10:36 AM, Alan Bateman wrote:
>> On 14/06/2017 15:32, David M. Lloyd wrote:
>>>
>>> It's coming from a user so my information is limited but I can
>>> establish that it is happening under load, and I think it corresponds
>>> to an open socket being abruptly closed in another thread.
>>>
>>> I am not sure whether I can get it down to a test case though. I'll
>>> see if I can get access to a Solaris system for testing.
>>>
>> If you get some idea on the conditions when this occurs then it would
>> be useful. To make sure there is nothing obvious, I ran JDK tests on
>> Solaris 11.3 system with the port Selector as the default.
>
> I'll see if I can find out more. I have gained access to a test
> environment but I haven't been able to reproduce it in isolation either.
We're reproducing the bug locally now, though the test is not yet
standalone. I'm going to try to create a new standalone test with a
large number of threads, as this seems to be a possible factor.
>> All the tests pass. I can't think of a scenario where port_dissociate
>> could fail with EBADF.
>
> It's not EBADF but EBADFD, if that makes a difference. I've been
> working off of various GC-related hypotheses but without knowing the
> exact conditions that precipitate EBADFD, I'm really shooting in the
> dark. One would have to examine the kernel sources to get that answer.
>
>> It is correct to ignore ENOENT as that occurs then the file descriptor
>> registered with the port is closed by dup'ing.
>
> Is it possible that this operation is non-atomic in the kernel, such
> that the descriptor is briefly in an intermediate state before being
> replaced by the placeholder?
>
--
- DML
More information about the nio-dev
mailing list