RFR 8066708: JMXStartStopTest fails to connect to port 38112

Thu Dec 11 17:53:30 UTC 2014

On 12/11/14 7:09 AM, olivier.lagneau at oracle.com wrote:
> On 11/12/2014 15:43, Dmitry Samersoff wrote:
>> You can set SO_LINGER to zero, in this case socket will be closed
>> immediately without waiting in TIME_WAIT
> SO-LINGER did not help either in my case (see my previous mail to Jaroslav).
> That ended-up in using another hard-coded (supposedly free) port.
> Note that was before RMI tests used randomly allocated ports.
>
>> But there are no reliable way to predict whether you can take this port
>> or not after you close it.
> This is what I observed in my case.
>>
>> So the only valid solution is to try to connect to a random port and if
>> this attempt fails try another random port. Everything else will cause
>> more or less frequent intermittent failures.
> IIRC think this is what is currently done in RMI tests.

The RMI tests are still suffering from this problem, unfortunately.

The RMI test library gets a "random" port with "new ServerSocket(0)", gets the 
port number, closes the socket, then returns the port to the caller. The caller 
then assumes that it can use that port as it wishes. That's when the 
BindException can occur. There are about 10 RMI test bugs in the database that 
all seem to have this as their root cause.

There is some retry logic in RMI's test library, but that's to avoid the 
so-called "reserved ports" that specific RMI tests use, or if "new 
ServerSocket(0)" fails. It doesn't have anything to do with the BindException 
that occurs when the caller attempts to reuse the port with another socket.

My observation is also that setting SO_REUSEADDR has no effect. I haven't tried 
SO_LINGER. My hunch is that it won't have any effect, since the sockets in 
question aren't actually going into TIME_WAIT state. But I suppose it's worth a try.

I don't have any solution for this; we're still discussing the issue. I think 
the best approach would be to refactor the code so that the eventual user of the 
socket opens it up on an ephemeral port in the first place. That avoids the 
open/close/reopen business. Unfortunately that doesn't help the case where you 
want to tell another JVM to run a service on a specific port. We don't have a 
solution for that case yet.

The second-best approach (not really a solution) is to open/close a serversocket 
to get the port, sleep for a little bit, then return the port number to the 
caller. This might give the kernel a chance to clean up the socket after the 
close. Of course, this still has a race condition, but it might reduce the 
incidence of problems to an acceptable level.

I'll let you know if we come up with anything better.

s'marks