RFR [8193596]: java/net/DatagramPacket/ReuseBuf.java failed due to timeout

Daniel Fuchs daniel.fuchs at oracle.com
Mon Aug 26 11:42:18 UTC 2019


Hi Mark,

On 24/08/2019 12:33, mark sheppard wrote:
> a couple of observations on the assertion for test failure that you may 
> wish to consider.
>> If there is a BindException for  the DatagramSocket instantiations​
> then this would suggest that there is an operating system​ issue.
> The sockets are being bound to an ephemeral port,  allocated by the OS, 
> which would mean​
> that the OS is choosing a port that it has  already allocated ! ​

No BindException was observed that I know - AFAICT the issue here
is simply that the test was observed hanging and failing in timeout.

> It may be worth checking that the ephemeral port range in the test 
> environments are appropriately configured,​
> as per IANA recommendations.​
>> One potential extreme condition is that IFF there are many thousands of 
> concurrent tests executing,​
> there could be ephemeral port exhaustion ?

Right. Doesn't seem to be the case here though.

> Another observation is that this is a UDP test, as such, it is 
> unreliable in its outcome.​
> That is to say, UDP sends are not guaranteed to be successful.​
> This could be especially true in a very heavily loaded system, which may 
> have some resource contention,​
> such as available UDP buffer space. As such, there is no guarantee that 
> any send will succeed.​
> It may be an extreme exaggeration, but the OS may accept a message to 
> send, copy from user space to kernel space,​
> but because of some extreme exceptional conditions in the kernel, drops 
> the message without notification.​

Good observation. I doubt that's the case, but definitely a possibility.
As always - an observed symptoms can have several probable cause.
Changing the test to not use the wildcard removes one of them.
If the test is still observed failing after that, we will dig deeper.
  ​
> Another point to keep in mind is that the test is multi threaded, and 
> again in heavily loaded system the scheduling of​
> a thread may not be as prompt as expected ?

Again a possibility. But let's not envisage that until we have
eliminated the other possible causes: this test is very small
and should execute quite promptly. For it to remain blocked for
the whole duration of the jtreg timeout because of CPU starvation
seems a bit extreme.

best regards,

-- daniel




More information about the net-dev mailing list