RFR 8066708: JMXStartStopTest fails to connect to port 38112
olivier.lagneau at oracle.com
olivier.lagneau at oracle.com
Thu Dec 11 15:01:51 UTC 2014
Hi Jaroslav,
On 11/12/2014 15:06, Jaroslav Bachorik wrote:
> Further investigation shows that the problem was rather the client
> connecting to a socket being shut down.
I remember I met this situation for an RMI fix a while ago and IIRC no
flag setting could help (SO_REUSEADDR as well),
the port kept being unavailable.
>
> It sounds like setting SO_REUSEADDR to false should prevent this failure.
>
> From the ServerSocket javadoc:
> "When a TCP connection is closed the connection may remain in a
> timeout state for a period of time after the connection is closed
> (typically known as the TIME_WAIT state or 2MSL wait state). For
> applications using a well known socket address or port it may not be
> possible to bind a socket to the required SocketAddress if there is a
> connection in the timeout state involving the socket address or port."
>
> It also turns out that the test does not close the server sockets
> properly so there might be several sockets being opened or timed out
> dangling around.
I think this is the main reason why we see these intermittent failures.
>
> I've updated the test so it is setting SO_REUSEADDR for all the new
> ServerSockets instances + introduced the mechanism to run the test
> code while properly cleaning up any allocated ports.
Olivier.
On 11/12/2014 15:06, Jaroslav Bachorik wrote:
> On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:
>> On 12/09/2014 01:39 AM, Stuart Marks wrote:
>>> On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:
>>>> Please, review the following test change
>>>>
>>>> Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00
>>>>
>>>> The test fails very intermittently when RMI registry is trying to bind
>>>> to a port
>>>> previously used in the test (via ServerSocket).
>>>>
>>>> This seems to be caused by the sockets created via `new
>>>> ServerSocket(0)` and
>>>> being in reusable mode. The fix attempts to prevent this by explicitly
>>>> forbidding the reusable mode.
>>>
>>> Hi Jaroslav,
>>>
>>> I happened to see this fly by, and there are (I think) some similar
>>> issues going on in the RMI tests.
>>>
>>> But first I'll note that I don't think setReuseAddress() will have the
>>> effect that you want. Typically it's set to true before binding a
>>> socket, so that a subsequent bind operation will succeed even if the
>>> address/port is already in use. ServerSockets created with new
>>> ServerSocket(0) are already bound, and I'm not sure what calling
>>> setReuseAddress(false) will do on such sockets. The spec says behavior
>>> is undefined, but my bet is that it does nothing.
>>>
>>> I guess it doesn't hurt to try this out to see if it makes a
>>> difference,
>>> but I don't have much confidence it will help.
>>>
>>> The potential similarity to the RMI tests is exemplified by JDK-8049202
>>> (sorry, this bug report isn't open) but briefly this tests the RMI
>>> registry as follows:
>>>
>>> 1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
>>> RMI registry port] in order to ensure that 1099 isn't in use by
>>> something else already;
>>>
>>> 2. If this succeeds, it immediately closes the ServerSocket.
>>>
>>> 3. Then it creates a new RMI registry on port 1099.
>>>
>>> In principle, this should succeed, yet it fails around 10% of the time
>>> on some systems. The error is "port already in use". My best theory is
>>> that even though the socket has just been closed by a user program, the
>>> kernel has to run the socket through some of the socket states such as
>>> FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually closed
>>> and is available for reuse. If a program -- even the same one --
>>> attempts to open a socket on the same port before the socket has
>>> reached
>>> its final state, it will get an "already in use error".
>>>
>>> If this is true I don't believe that setting SO_REUSEADDR will work if
>>> the socket is in one of these final states. (I remember reading this
>>> somewhere but I'm not sure where at the moment. I can try to dig it up
>>> if there is interest.)
>>>
>>> I admit this is just a theory and I'm open to alternatives, and I'm
>>> also
>>> open to hearing about ways to deal with this problem.
>>>
>>> Could something similar be going on with this JMX test?
>>
>> Hm, this is exactly what happened with this test :(
>>
>> The problem is that the port is reported as available while it is still
>> occupied and RMI registry attempts to start using that port.
>>
>> If setting SO_REUSEADDR does not work then the only solution would be to
>> retry the test case when this exception occurs.
>
> Further investigation shows that the problem was rather the client
> connecting to a socket being shut down.
>
> It sounds like setting SO_REUSEADDR to false should prevent this failure.
>
> From the ServerSocket javadoc:
> "When a TCP connection is closed the connection may remain in a
> timeout state for a period of time after the connection is closed
> (typically known as the TIME_WAIT state or 2MSL wait state). For
> applications using a well known socket address or port it may not be
> possible to bind a socket to the required SocketAddress if there is a
> connection in the timeout state involving the socket address or port."
>
> It also turns out that the test does not close the server sockets
> properly so there might be several sockets being opened or timed out
> dangling around.
>
> I've updated the test so it is setting SO_REUSEADDR for all the new
> ServerSockets instances + introduced the mechanism to run the test
> code while properly cleaning up any allocated ports.
>
> http://cr.openjdk.java.net/~jbachorik/8066708/webrev.01/
>
> -JB-
>
>>
>> -JB-
>>
>>>
>>> s'marks
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20141211/41c5bdf9/attachment.html>
More information about the serviceability-dev
mailing list