RFR 8066708: JMXStartStopTest fails to connect to port 38112
Dmitry Samersoff
dmitry.samersoff at oracle.com
Thu Dec 11 18:18:25 UTC 2014
Stuart,
As soon as you close socket, you open a door for the race.
So you need another communication channel to pass a port number (or bind
result) between a client and a server without closing a socket on the
server side.
Typical scenario used by network related code is:
1. Server opens the socket
2. Server binds to port(0)
3. Server gets port number assigned by OS
4. Server informs client (e.g. write the port down to known file,
broadcast it etc)
5. Client establishes connection.
If the server is a blackbox and have to get a port number from outside,
scenario looks like:
WHILE(!success and !timeout)
1. Driver chooses random port number
2. Driver runs a server with this number
3. Driver checks that server is actually listening on this port
(e.g. try to connect by it self)
WEND
4. Driver runs a client with this port number or bails out with
descriptive error message.
-Dmitry
On 2014-12-11 20:53, Stuart Marks wrote:
>
>
> On 12/11/14 7:09 AM, olivier.lagneau at oracle.com wrote:
>> On 11/12/2014 15:43, Dmitry Samersoff wrote:
>>> You can set SO_LINGER to zero, in this case socket will be closed
>>> immediately without waiting in TIME_WAIT
>> SO-LINGER did not help either in my case (see my previous mail to
>> Jaroslav).
>> That ended-up in using another hard-coded (supposedly free) port.
>> Note that was before RMI tests used randomly allocated ports.
>>
>>> But there are no reliable way to predict whether you can take this port
>>> or not after you close it.
>> This is what I observed in my case.
>>>
>>> So the only valid solution is to try to connect to a random port and if
>>> this attempt fails try another random port. Everything else will cause
>>> more or less frequent intermittent failures.
>> IIRC think this is what is currently done in RMI tests.
>
> The RMI tests are still suffering from this problem, unfortunately.
>
> The RMI test library gets a "random" port with "new ServerSocket(0)",
> gets the port number, closes the socket, then returns the port to the
> caller. The caller then assumes that it can use that port as it wishes.
> That's when the BindException can occur. There are about 10 RMI test
> bugs in the database that all seem to have this as their root cause.
>
> There is some retry logic in RMI's test library, but that's to avoid the
> so-called "reserved ports" that specific RMI tests use, or if "new
> ServerSocket(0)" fails. It doesn't have anything to do with the
> BindException that occurs when the caller attempts to reuse the port
> with another socket.
>
> My observation is also that setting SO_REUSEADDR has no effect. I
> haven't tried SO_LINGER. My hunch is that it won't have any effect,
> since the sockets in question aren't actually going into TIME_WAIT
> state. But I suppose it's worth a try.
>
> I don't have any solution for this; we're still discussing the issue. I
> think the best approach would be to refactor the code so that the
> eventual user of the socket opens it up on an ephemeral port in the
> first place. That avoids the open/close/reopen business. Unfortunately
> that doesn't help the case where you want to tell another JVM to run a
> service on a specific port. We don't have a solution for that case yet.
>
> The second-best approach (not really a solution) is to open/close a
> serversocket to get the port, sleep for a little bit, then return the
> port number to the caller. This might give the kernel a chance to clean
> up the socket after the close. Of course, this still has a race
> condition, but it might reduce the incidence of problems to an
> acceptable level.
>
> I'll let you know if we come up with anything better.
>
> s'marks
--
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.
More information about the serviceability-dev
mailing list