8249786: java/net/httpclient/websocket/PendingPingTextClose.java fails very infrequently
Pavel Rappo
pavel.rappo at oracle.com
Fri Aug 7 12:49:48 UTC 2020
Daniel,
The changes in TransportImpl look okay to me. I cannot see how they break anything. On the other hand, I guess it will take some time to see if the (complete) changeset fixes the issue.
One minor thing that makes me a tad bit uncomfortable is the new assert statements. Firstly, they may have visibility side effects. Secondly, they could be more informative.
May I suggest we use this?
ChannelState s;
assert (s = writeState.get()) == CLOSED : s;
Or better still,
ChannelState s = writeState.get();
assert s == CLOSED : s;
-Pavel
> On 23 Jul 2020, at 16:02, Daniel Fuchs <daniel.fuchs at oracle.com> wrote:
>
> Hi,
>
> More testing revealed that some other tests of the same family
> kept on failing intermittently, though my changes to
> PendingOperation.java should have fixed them.
>
> So here is a broader fix - which seems to have fixed the issue.
> But as a consequence - I am no longer planning to push it to
> 15 as it also changes some source files:
>
> http://cr.openjdk.java.net/~dfuchs/webrev_8249786/webrev.02/
>
> best regards,
>
> -- daniel
>
> On 21/07/2020 18:53, Daniel Fuchs wrote:
>> Hi,
>> Please find below a fix for:
>> 8249786: java/net/httpclient/websocket/PendingPingTextClose.java
>> fails very infrequently
>> https://bugs.openjdk.java.net/browse/JDK-8249786
>> webrev:
>> http://cr.openjdk.java.net/~dfuchs/webrev_8249786/webrev.00/
>> This test has been observed failing on windows in the JDK 15 CI.
>> The most troublesome issue is that the test was producing so
>> much output that the actual reason for the failure was lost
>> in the output overflow.
>> After instrumenting the test to limit the output and
>> adding a few higher level traces, I was able to reproduce
>> once (out of 250 runs) and see the actual stack trace.
>> The test fails just after calling websocket.abort() when it tries
>> to verify that the cfPing CompletableFuture completed
>> exceptionally, and found that it actually successfully completed.
>> The logic of the test is to try to fill up the local send buffer
>> by sending ping messages, so that an attempt to write to the
>> socket will block.
>> For that it creates a server that will accept a websocket
>> connection, but will not read anything from the socket input.
>> The client side sends ping packets until the socket buffer
>> is full - which is detected by setting up a 10s timeout and
>> observing that the ping data could not be written during
>> this time. The assumption of the test is that a write call
>> that takes more than 10s is indicative that the buffers are
>> full, and will never succeed.
>> The problem occurs when the write succeeds after ~10s either
>> because the kernel was busy doing some other things, or because
>> the kernel suddenly decided to resize (increase) the buffers,
>> which causes the write call to unblock and succeed after 10s.
>> The test already had some provision and a workaround for that
>> issue - via a repeatable( ) operation - but the workaround
>> was only enabled for macOS where such behavior had first been
>> observed.
>> The fix extends that workaround for Windows - since the later
>> failure shows that something similar is happening there.
>> The fix also moves the websocket.abort() and following check
>> inside the repeatable loop for better reliability.
>> Since pushing test fixes during rampdown 2 is permitted,
>> and since the failure was observed in the JDK 15 CI, I'm
>> planning to push this test fix to the jdk15 repo,
>> unless I hear any objection.
>> best regards,
>> -- daniel
>
More information about the net-dev
mailing list