RFR 8066708: JMXStartStopTest fails to connect to port 38112
Stuart Marks
stuart.marks at oracle.com
Thu Dec 11 17:53:30 UTC 2014
On 12/11/14 7:09 AM, olivier.lagneau at oracle.com wrote:
> On 11/12/2014 15:43, Dmitry Samersoff wrote:
>> You can set SO_LINGER to zero, in this case socket will be closed
>> immediately without waiting in TIME_WAIT
> SO-LINGER did not help either in my case (see my previous mail to Jaroslav).
> That ended-up in using another hard-coded (supposedly free) port.
> Note that was before RMI tests used randomly allocated ports.
>
>> But there are no reliable way to predict whether you can take this port
>> or not after you close it.
> This is what I observed in my case.
>>
>> So the only valid solution is to try to connect to a random port and if
>> this attempt fails try another random port. Everything else will cause
>> more or less frequent intermittent failures.
> IIRC think this is what is currently done in RMI tests.
The RMI tests are still suffering from this problem, unfortunately.
The RMI test library gets a "random" port with "new ServerSocket(0)", gets the
port number, closes the socket, then returns the port to the caller. The caller
then assumes that it can use that port as it wishes. That's when the
BindException can occur. There are about 10 RMI test bugs in the database that
all seem to have this as their root cause.
There is some retry logic in RMI's test library, but that's to avoid the
so-called "reserved ports" that specific RMI tests use, or if "new
ServerSocket(0)" fails. It doesn't have anything to do with the BindException
that occurs when the caller attempts to reuse the port with another socket.
My observation is also that setting SO_REUSEADDR has no effect. I haven't tried
SO_LINGER. My hunch is that it won't have any effect, since the sockets in
question aren't actually going into TIME_WAIT state. But I suppose it's worth a try.
I don't have any solution for this; we're still discussing the issue. I think
the best approach would be to refactor the code so that the eventual user of the
socket opens it up on an ephemeral port in the first place. That avoids the
open/close/reopen business. Unfortunately that doesn't help the case where you
want to tell another JVM to run a service on a specific port. We don't have a
solution for that case yet.
The second-best approach (not really a solution) is to open/close a serversocket
to get the port, sleep for a little bit, then return the port number to the
caller. This might give the kernel a chance to clean up the socket after the
close. Of course, this still has a race condition, but it might reduce the
incidence of problems to an acceptable level.
I'll let you know if we come up with anything better.
s'marks
More information about the serviceability-dev
mailing list