[teststabilization] RFR 8229348: java/net/DatagramSocket/UnreferencedDatagramSockets.java fails intermittently
mark sheppard
macanaoire at hotmail.com
Sat Aug 10 20:14:18 UTC 2019
Hi Daniel,
fwiw a couple of points in passing on your change,
while noting the rationale for the change.
It is suggested that the use of wildcard address (INADDR_ANY) is a contributing factor, but i'm
not sure that is a correct assumption. The sockets created in the initial version of the test
bind to inaddr_any and an ephemeral port. The ephemeral port is chosen by the OS. So for another
test to have the same port as those in this test would suggest an issue in the OS port allocation
strategy.
This also applies to the recent tagging of test/jdk/java/net/DatagramSocket/ReuseAddressTest.java
as intermittent failure due to SO_REUSEADDR. All the DatgramSocket, in this test, are using INADDR_ANY and
an ephemeral port. Other than the multicastsocket scenario, which also uses an ephemeral port and
set the so_reuseaddr options, all datagramsockets use wildcard and an ephemeral, making an endpoint
clash with another test unlikely.
Just to emphasize the reuse address option is not on the IP address but rather on the IP address and port
combination, or at least that what is was meant to be. In the TCP context there are other restrictions, also.
Your change is to the "server socket", but the client is a symmetric equivalent, DatagramSocket on
wildcard and ephemeral port, so the echo send from the server could equally be sent astray!!
That is, if the wildcard addressing is an issue.
Looking at the overall structure of the test, is it not possible, that the server socket has been
GCed, finalized and so closed, prior to the server's packet send having been completed within the OS, and so the
client hangs.
So a little jackanory in the context of jtreg execution with many tests executing concurrently,
it would seem possible, although may be wild conjecture, that client send packet to server, client executes receive and
blocks on i/o. Server (thread) receives packet and then echoes the packet, this is copied from user space into the
kernel space and placed on queue for send (send is pending), and OS send call returns,
server releases socket, which is available for GC. With the heavy load on the
test system, with hundreds, maybe thousands of threads, the server echo packet is pending send,
and client hasn't received the echo packet. In the meantime GC is scheduled
and eager beaver it reclaims the released server socket. This closes the server socket, which in turn drops the pending
echo datagram, and the client continues to wait in receive. ??
So it could be down to the load on the system, the number of concurrently executing threads, and whether the
GC thread executed before the server's echo packet send was completed within the OS kernel.
sometimes with the intermittently failing test there is a quirkiness about the net config on the test system,
it can be useful for diagnostic assistance to dump the config at the start of the test.
in summary, the use of wildcard address (inaddr_any) and ephemeral port should not be an issue here in this context.
There should be no conflicts for OS allocated ephemeral ports.
making your change to the "server socket", should it be equally applied to the client datagram socket?
regards
Mark
________________________________
From: net-dev <net-dev-bounces at openjdk.java.net> on behalf of Daniel Fuchs <daniel.fuchs at oracle.com>
Sent: Friday 9 August 2019 15:36
To: OpenJDK Network Dev list <net-dev at openjdk.java.net>
Subject: [teststabilization] RFR 8229348: java/net/DatagramSocket/UnreferencedDatagramSockets.java fails intermittently
Hi,
Please find below a trivial fix for:
8229348: java/net/DatagramSocket/UnreferencedDatagramSockets.java
fails intermittently
https://bugs.openjdk.java.net/browse/JDK-8229348
webrev: http://cr.openjdk.java.net/~dfuchs/webrev_8229348/webrev.00/
This test has been observed failing intermittently in our CI.
The test failed in timeout - and there's no message saying
that the expected reply has been received or that any file
descriptor has been freed.
I suspect the test was blocked in receive() in its main() method
due to port reuse issues.
The usual fix for that is to avoid binding to the wildcard.
best regards,
-- daniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/net-dev/attachments/20190810/4c7ddf21/attachment.html>
More information about the net-dev
mailing list