Bug in sun.nio.ch.SolarisEventPort#port_dissociate

David M. Lloyd david.lloyd at redhat.com
Wed Jun 28 13:37:42 UTC 2017


On 06/16/2017 10:59 AM, David M. Lloyd wrote:
> On 06/16/2017 10:36 AM, Alan Bateman wrote:
>> On 14/06/2017 15:32, David M. Lloyd wrote:
>>>
>>> It's coming from a user so my information is limited but I can 
>>> establish that it is happening under load, and I think it corresponds 
>>> to an open socket being abruptly closed in another thread.
>>>
>>> I am not sure whether I can get it down to a test case though. I'll 
>>> see if I can get access to a Solaris system for testing.
>>>
>> If you get some idea on the conditions when this occurs then it would 
>> be useful.  To make sure there is nothing obvious, I ran JDK tests on 
>> Solaris 11.3 system with the port Selector as the default.
> 
> I'll see if I can find out more.  I have gained access to a test 
> environment but I haven't been able to reproduce it in isolation either.

We're reproducing the bug locally now, though the test is not yet 
standalone.  I'm going to try to create a new standalone test with a 
large number of threads, as this seems to be a possible factor.

>> All the tests pass. I can't think of a scenario where port_dissociate 
>> could fail with EBADF. 
> 
> It's not EBADF but EBADFD, if that makes a difference.  I've been 
> working off of various GC-related hypotheses but without knowing the 
> exact conditions that precipitate EBADFD, I'm really shooting in the 
> dark.  One would have to examine the kernel sources to get that answer.
> 
>> It is correct to ignore ENOENT as that occurs then the file descriptor 
>> registered with the port is closed by dup'ing.
> 
> Is it possible that this operation is non-atomic in the kernel, such 
> that the descriptor is briefly in an intermediate state before being 
> replaced by the placeholder?
> 


-- 
- DML


More information about the nio-dev mailing list