8249786: java/net/httpclient/websocket/PendingPingTextClose.java fails very infrequently
Daniel Fuchs
daniel.fuchs at oracle.com
Thu Jul 23 15:02:34 UTC 2020
Hi,
More testing revealed that some other tests of the same family
kept on failing intermittently, though my changes to
PendingOperation.java should have fixed them.
So here is a broader fix - which seems to have fixed the issue.
But as a consequence - I am no longer planning to push it to
15 as it also changes some source files:
http://cr.openjdk.java.net/~dfuchs/webrev_8249786/webrev.02/
best regards,
-- daniel
On 21/07/2020 18:53, Daniel Fuchs wrote:
> Hi,
>
> Please find below a fix for:
>
> 8249786: java/net/httpclient/websocket/PendingPingTextClose.java
> fails very infrequently
> https://bugs.openjdk.java.net/browse/JDK-8249786
>
> webrev:
> http://cr.openjdk.java.net/~dfuchs/webrev_8249786/webrev.00/
>
> This test has been observed failing on windows in the JDK 15 CI.
>
> The most troublesome issue is that the test was producing so
> much output that the actual reason for the failure was lost
> in the output overflow.
>
> After instrumenting the test to limit the output and
> adding a few higher level traces, I was able to reproduce
> once (out of 250 runs) and see the actual stack trace.
> The test fails just after calling websocket.abort() when it tries
> to verify that the cfPing CompletableFuture completed
> exceptionally, and found that it actually successfully completed.
>
> The logic of the test is to try to fill up the local send buffer
> by sending ping messages, so that an attempt to write to the
> socket will block.
> For that it creates a server that will accept a websocket
> connection, but will not read anything from the socket input.
>
> The client side sends ping packets until the socket buffer
> is full - which is detected by setting up a 10s timeout and
> observing that the ping data could not be written during
> this time. The assumption of the test is that a write call
> that takes more than 10s is indicative that the buffers are
> full, and will never succeed.
>
> The problem occurs when the write succeeds after ~10s either
> because the kernel was busy doing some other things, or because
> the kernel suddenly decided to resize (increase) the buffers,
> which causes the write call to unblock and succeed after 10s.
>
> The test already had some provision and a workaround for that
> issue - via a repeatable( ) operation - but the workaround
> was only enabled for macOS where such behavior had first been
> observed.
>
> The fix extends that workaround for Windows - since the later
> failure shows that something similar is happening there.
> The fix also moves the websocket.abort() and following check
> inside the repeatable loop for better reliability.
>
> Since pushing test fixes during rampdown 2 is permitted,
> and since the failure was observed in the JDK 15 CI, I'm
> planning to push this test fix to the jdk15 repo,
> unless I hear any objection.
>
> best regards,
>
> -- daniel
More information about the net-dev
mailing list