8249786: java/net/httpclient/websocket/PendingPingTextClose.java fails very infrequently

Daniel Fuchs daniel.fuchs at oracle.com
Thu Jul 23 15:02:34 UTC 2020


Hi,

More testing revealed that some other tests of the same family
kept on failing intermittently,  though my changes to
PendingOperation.java should have fixed them.

So here is a broader fix - which seems to have fixed the issue.
But as a consequence - I am no longer planning to push it to
15 as it also changes some source files:

http://cr.openjdk.java.net/~dfuchs/webrev_8249786/webrev.02/

best regards,

-- daniel

On 21/07/2020 18:53, Daniel Fuchs wrote:
> Hi,
> 
> Please find below a fix for:
> 
> 8249786: java/net/httpclient/websocket/PendingPingTextClose.java
>           fails very infrequently
> https://bugs.openjdk.java.net/browse/JDK-8249786
> 
> webrev:
> http://cr.openjdk.java.net/~dfuchs/webrev_8249786/webrev.00/
> 
> This test has been observed failing on windows in the JDK 15 CI.
> 
> The most troublesome issue is that the test was producing so
> much output that the actual reason for the failure was lost
> in the output overflow.
> 
> After instrumenting the test to limit the output and
> adding a few higher level traces, I was able to reproduce
> once (out of 250 runs) and see the actual stack trace.
> The test fails just after calling websocket.abort() when it tries
> to verify that the cfPing CompletableFuture completed
> exceptionally, and found that it actually successfully completed.
> 
> The logic of the test is to try to fill up the local send buffer
> by sending ping messages, so that an attempt to write to the
> socket will block.
> For that it creates a server that will accept a websocket
> connection, but will not read anything from the socket input.
> 
> The client side sends ping packets until the socket buffer
> is full - which is detected by setting up a 10s timeout and
> observing that the ping data could not be written during
> this time. The assumption of the test is that a write call
> that takes more than 10s is indicative that the buffers are
> full, and will never succeed.
> 
> The problem occurs when the write succeeds after ~10s either
> because the kernel was busy doing some other things, or because
> the kernel suddenly decided to resize (increase) the buffers,
> which causes the write call to unblock and succeed after 10s.
> 
> The test already had some provision and a workaround for that
> issue - via a repeatable( ) operation - but the workaround
> was only enabled for macOS where such behavior had first been
> observed.
> 
> The fix extends that workaround for Windows - since the later
> failure shows that something similar is happening there.
> The fix also moves the websocket.abort() and following check
> inside the repeatable loop for better reliability.
> 
> Since pushing test fixes during rampdown 2 is permitted,
> and since the failure was observed in the JDK 15 CI, I'm
> planning to push this test fix to the jdk15 repo,
> unless I hear any objection.
> 
> best regards,
> 
> -- daniel



More information about the net-dev mailing list