JDK 9 RFR of JDK-8068693/8153209: (ch) test java/nio/channels/AsyncCloseAndInterrupt.java failing

Martin Buchholz martinrb at google.com
Wed Apr 6 03:35:22 UTC 2016


Thanks for the explanation.  We occasionally have essentially the same
tough problem in j.u.concurrent.
Sometimes you can monitor the target thread (e.g. Thread.STATE) and spin-wait.
For blocking in I/O, the OS might know, but then you have system
dependent code to query it.
If you sleep for 100ms, it will sometimes not be enough and you will
have (too many) sporadic flakes.
if you sleep for 10seconds, it will almost surely be enough, but
that's a very long time for a test.
Sometimes the only reasonable solution is to keep retrying with ever
increasing sleeps up to e.g. 10 seconds.
Then the test will almost always run fast and still (almost) never flake.
I'm still not your real reviewer.

On Tue, Apr 5, 2016 at 7:20 PM, Hamlin Li <huaming.li at oracle.com> wrote:
>
>
> On 2016/4/6 10:01, Martin Buchholz wrote:
>>
>> On Tue, Apr 5, 2016 at 6:26 PM, Hamlin Li <huaming.li at oracle.com> wrote:
>>>
>>> Hi Martin,
>>> Thanks for reviewing, please check my comments inline.
>>>
>>> -Hamlin
>>>
>>> On 2016/4/6 1:23, Martin Buchholz wrote:
>>>>
>>>> The use of sleeps in these tests is disturbing.  Occasionally they're
>>>> necessary, but usually they can be removed. For example
>>>>
>>>>    574         do {
>>>>    575             sleep(50);
>>>>    576         } while (!t.ready);
>>>
>>> Yes, agree with you. But as this piece of code is legacy code, and does
>>> not
>>> cause any testing issue, so I choose to keep it.
>>
>> Well, yes, but you are introducing a new sleep(100).
>>
>> My own experience is that 100ms is never long enough to wait for any
>> other thread to do something.  But this is just a drive by comment - I
>> don't understand this code enough to understand the purpose of the
>> sleep.
>
> Hi Martin,
>
> ( There is a little bit long history, it involves 3 bugs
> https://bugs.openjdk.java.net/browse/JDK-8151582,
> https://bugs.openjdk.java.net/browse/JDK-8068693,
> https://bugs.openjdk.java.net/browse/JDK-8153209. )
>
> No, this(sleep(100)) is not introduced by me, it's deleted in another fix
> where I tried to fix another issue
> (https://bugs.openjdk.java.net/browse/JDK-8151582,
> http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/88577677aec9), but seems the
> fix introduced a new issue(https://bugs.openjdk.java.net/browse/JDK-8153209)
> which never occurred before. so sleep(100) is in fact kind of an rollback
> rather than introducing new code.
> And, even if we use CountDownLatch to wait, I'm afraid it will not work
> exactly as expected, below is a briefing:
> in thread_main,
>   1. launch thread_2
>   2. wait (sleep or CountDownLatch) until thread_2 get into a blocking io
> operation such as reading a socket
>   3. call close/interrupt ...
> in thread 2,
>   1. before get to the reading operation, notify thread_main
>   2. get into the blocking io operation
> But there is still a time window between step 3 in main_thread, and step 2
> in thread_2, so I don't think CountDownLatch will solve the issue perfectly.
> As this new issue(https://bugs.openjdk.java.net/browse/JDK-8153209) never
> occur before, so I think the best way to fix it is to keep the original code
> for the new failing test cases reported in
> https://bugs.openjdk.java.net/browse/JDK-8153209, and introducing the new
> code(not calling sleep(100)) for the failing test cases reported in
> https://bugs.openjdk.java.net/browse/JDK-8151582.
>
> Thank you
> -Hamlin
>


More information about the nio-dev mailing list