Bug: SelectionKey.interestOps ignores interestOps on Windows

Frank Ding dingxmin at linux.vnet.ibm.com
Sun Feb 3 23:57:22 PST 2013


Hi Alan,
   Thanks for your comments on the patch and the test case.
   I did more investigation on the  synchronization policy in Windows 
selector.
   Currently, closeLock is used in 3 methods of WindowsSelectorImpl, 
which are implClose, putEventOps, implRegister.  There are several vars 
guarded by closeLock in these 3 methods such as pollWrapper, 
channelArray, totalChannels, fdMap and some others.  Methods implClose, 
implRegister and implDereg are called by SelectorImpl which are all 
guarded by publicKeys.  The contention arises in this case between 
putEventOps and implDereg.  putEventOps is called by 
SelectionKey.interestOps where only closeLock is acquired and var 
pollWrapper is affected.  As a conclusion, I incline to keep the sync 
block as it is.

   As of the test case, I did change some by absorbing your comments:
1. Class name and test case folder
2. No System.exit
3. No try/catch at the end
4. Better termination in main thread
5. Selector and SocketChannels are closed.

  Please review the revised test case again
http://cr.openjdk.java.net/~dingxmin/6429204/webrev.01

Thanks && Best regards,
Frank

On 1/31/2013 3:54 AM, Alan Bateman wrote:
>  On 30/01/2013 07:21, Frank Ding wrote:
>> Hi guys,
>>   There is a bug in nio selector on Windows which is reported by bug 
>> 4863822 (closed now) and 6429204 (open).
>>
>>   The issue happens on Windows, where two threads, one doing a 
>> channel registration/deregistration with a selector and the other 
>> setting interestOps on one of the channels registered to the selector.
>>   The thread doing select() goes eventually to 
>> WindowsSelectorImpl.implDereg() that rearranges SelectionKeyImpls in 
>> array channelArray and then adjusts entries in pollWrapper.  Since 
>> implDereg method does not synchronize rearrangement of channelArray 
>> and accessing pollWrapper, the other thread calling interestOps has a 
>> possibility of updating ops on the already deleted fd in the poll array.
>>
>>   The problem can be manifested by the test case in the webrev review 
>> [1].  The test loops a large number of times in thread 1 trying to 
>> change interest set for a SelectionKey.  The other thread(main 
>> thread) should be able to sense the change and notify thread 1 within 
>> a short period of time.  I admit that the test is not deterministic, 
>> which means it has a chance to pass with current JDK.  I think it 
>> would be better having a bootstrapped class to reproduce the problem 
>> in a stable way but don't have any idea whether jtreg can achieve this.
>>   Solution to the problem is quite straightforward.  See patch in 
>> [1].  Simply extending closeLock sync scope from implRegister, 
>> putEventOps, implClose to implDereg.  I have concern on the scope of 
>> lock closeLock.  In the patch I provided, I put 
>> synchronized(closeLock) within if statement. This resolves the issue 
>> and also minimizes the potential performance degradation. However, in 
>> class WindowsSelectorImpl, closeLock should protect other variables 
>> such as totalChannels, fdMap, keys and selectedKeys.  So would it 
>> make more sense to increase scope to before deregister(ski) call?
>>
>>   Thanks in advance for any comments and review offered.
>>
>> [1] http://cr.openjdk.java.net/~dingxmin/6429204/webrev.00/
> Yes, there is a problem here. Most of the original issue was fixed by 
> the changes in 5025260 but implDereg was left to 6429204.
>
> It needs a bit of study (to remember how the synchronization is 
> specified and implemented in this Selector) but I think the 
> synchronization block does need to go further to include down to (and 
> including) the fdMap.remove.
>
> The test will be a bit of clean-up. Tests for Selector go into 
> test/java/nio/channels/Selector. A possible name for the test is 
> RacyDeregister (we've avoiding putting bug IDs into the test names 
> here). jtreg tests aren't supposed to use System.exit so I think it 
> needs a better way to terminate. Also it the try/catch to throw 
> RuntimeException at the end isn't needed. Otherwise I think it just 
> needs clean-up and to make sure that the Selector and SocketChannels 
> are closed.
>
> -Alan.



More information about the nio-dev mailing list