Possible network issue

Wed Nov 8 00:50:55 UTC 2023

I ran the test multiple times - each had failures. As soon as a failure (SocketException) was reported I ran the netstat.

The reports of ‘memory delayed’ seems to align with the number of reported failures - which would mean that the OS is returning ENOBUFS as a temporary condition.

Here are the results:

iMac:helidon_test robertengels$ netstat -nm
9138/15954 mbufs in use:
	8231 mbufs allocated to data
	1 mbufs allocated to ancillary data
	665 mbufs allocated to packet headers
	1 mbufs allocated to socket names and addresses
	240 mbufs allocated to packet tags
	6816 mbufs allocated to caches
941/2730 mbuf 2KB clusters in use
3/50678 mbuf 4KB clusters in use
4100/10922 mbuf 16KB clusters in use
398566 KB allocated to network (18.2% in use)
0 KB returned to the system
0 requests for memory denied
10163 requests for memory delayed
131 calls to drain routines
iMac:helidon_test robertengels$ netstat -nm
29711/29910 mbufs in use:
	27064 mbufs allocated to data
	2407 mbufs allocated to packet headers
	240 mbufs allocated to packet tags
	199 mbufs allocated to caches
3136/3238 mbuf 2KB clusters in use
3121/51132 mbuf 4KB clusters in use
10728/10922 mbuf 16KB clusters in use
398566 KB allocated to network (50.3% in use)
0 KB returned to the system
0 requests for memory denied
10342 requests for memory delayed
136 calls to drain routines
iMac:helidon_test robertengels$ netstat -nm
61785/76680 mbufs in use:
	54018 mbufs allocated to data
	7527 mbufs allocated to packet headers
	240 mbufs allocated to packet tags
	14895 mbufs allocated to caches
7991/10098 mbuf 2KB clusters in use
20468/44823 mbuf 4KB clusters in use
10909/10922 mbuf 16KB clusters in use
398566 KB allocated to network (72.9% in use)
0 KB returned to the system
0 requests for memory denied
10407 requests for memory delayed
136 calls to drain routines
iMac:helidon_test robertengels$ netstat -nm
18438/23213 mbufs in use:
	14924 mbufs allocated to data
	3274 mbufs allocated to packet headers
	240 mbufs allocated to packet tags
	4775 mbufs allocated to caches
4168/4346 mbuf 2KB clusters in use
1245/49996 mbuf 4KB clusters in use
6159/10922 mbuf 16KB clusters in use
398566 KB allocated to network (29.9% in use)
0 KB returned to the system
0 requests for memory denied
10668 requests for memory delayed
141 calls to drain routines
iMac:helidon_test robertengels$ netstat -nm
19765/22908 mbufs in use:
	15860 mbufs allocated to data
	3665 mbufs allocated to packet headers
	240 mbufs allocated to packet tags
	3143 mbufs allocated to caches
3975/4130 mbuf 2KB clusters in use
10/50751 mbuf 4KB clusters in use
7783/10922 mbuf 16KB clusters in use
398566 KB allocated to network (35.2% in use)
0 KB returned to the system
0 requests for memory denied
11091 requests for memory delayed
146 calls to drain routines
iMac:helidon_test robertengels$ 

I then killed the server process and ran netstat again, and it reported:

iMac:~ robertengels$ netstat -nm
596/11888 mbufs in use:
	348 mbufs allocated to data
	8 mbufs allocated to packet headers
	240 mbufs allocated to packet tags
	11292 mbufs allocated to caches
280/2730 mbuf 2KB clusters in use
0/51669 mbuf 4KB clusters in use
0/10922 mbuf 16KB clusters in use
398566 KB allocated to network (0.9% in use)
0 KB returned to the system
0 requests for memory denied
11091 requests for memory delayed
146 calls to drain routines
iMac:~ robertengels$ 

It could be that the server process is leaking sockets - but I don’t think so. I am going to perform some heap analysis to make sure.

> On Nov 7, 2023, at 10:59 AM, Robert Engels <rengels at ix.netcom.com> wrote:
> 
> Will do. It does seem temporary. For instance with h2load testing, with 100k requests it will fail 10k of them - but this may be overstated by 10x since I have it set to pipeline 10 requests per connection. 
> 
> If I let the system sit for a bit - probably cleaning up the TIMEDWAIT connections - it will often run without failure. But rapid invocations of the test lead to the cited exception. 
> 
>> On Nov 7, 2023, at 10:33 AM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>> 
>> On 07/11/2023 13:28, Robert Engels wrote:
>>> I suspect that what is happening is that the JVM gets the epoll notification that the socket is writable but by the time the writer runs the network buffers have been exhausted.
>>> 
>>> This should be an expected condition and seems like a JDK bug introduced with VT support. I checked earlier JDK releases (jdk15) and the sync -> non blocking code was not present in SocketOutputStream.java
>> I don't think this is specific to virtual threads as we're seeing it elsewhere too. If there is no space in the socket write buffer then EAGAIN/EWOULDBLOCK is handled.  The intermittent ENOBUFS, which we're only seen on macos-aarch64 when under load, is hard to explain. We've had also some intermittent ENOMEM when joining multicast groups, also not specific to virtual threads.
>> 
>> Your mail said you can duplicate this readily. It would be useful if could capture the output of `netstat -nm` at around the time that this happens so see if this is a real resource exhaustion issue or not.
>> 
>> -Alan
>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20231107/cc5ec919/attachment-0001.htm>