JDK 9 RFR of JDK-8068693/8153209: (ch) test java/nio/channels/AsyncCloseAndInterrupt.java failing

Fri Apr 8 02:34:34 UTC 2016

Would any reviewer be available to help review the code change?

bug: https://bugs.openjdk.java.net/browse/JDK-8068693, 
https://bugs.openjdk.java.net/browse/JDK-8153209
webrev: http://cr.openjdk.java.net/~mli/8068693/webrev.00/

Thank you
-Hamlin

On 2016/4/6 11:43, Hamlin Li wrote:
>
>
> On 2016/4/6 11:35, Martin Buchholz wrote:
>> Thanks for the explanation.  We occasionally have essentially the same
>> tough problem in j.u.concurrent.
>> Sometimes you can monitor the target thread (e.g. Thread.STATE) and spin-wait.
>> For blocking in I/O, the OS might know, but then you have system
>> dependent code to query it.
>> If you sleep for 100ms, it will sometimes not be enough and you will
>> have (too many) sporadic flakes.
>> if you sleep for 10seconds, it will almost surely be enough, but
>> that's a very long time for a test.
>> Sometimes the only reasonable solution is to keep retrying with ever
>> increasing sleeps up to e.g. 10 seconds.
>> Then the test will almost always run fast and still (almost) never flake.
>> I'm still not your real reviewer.
> Hi Martin,
>
> Thanks for the quick response. :-)
> Yes, you are right, we need find a safe time to wait/sleep. In this 
> case, 100 ms should be the safe one, as it has been run successfully 
> since beginning until the fix for 
> https://bugs.openjdk.java.net/browse/JDK-8151582 broke it, so I think 
> it's safe for us to rollback to 100 ms for the failing test cases.
>
> Hi *Alan*, *Roger*, *Brian*,
>
> Would you mind to review the code change for these 2 bugs when you're 
> available?
>
> Thank you
> -Hamlin
>> On Tue, Apr 5, 2016 at 7:20 PM, Hamlin Li<huaming.li at oracle.com>  wrote:
>>> On 2016/4/6 10:01, Martin Buchholz wrote:
>>>> On Tue, Apr 5, 2016 at 6:26 PM, Hamlin Li<huaming.li at oracle.com>  wrote:
>>>>> Hi Martin,
>>>>> Thanks for reviewing, please check my comments inline.
>>>>>
>>>>> -Hamlin
>>>>>
>>>>> On 2016/4/6 1:23, Martin Buchholz wrote:
>>>>>> The use of sleeps in these tests is disturbing.  Occasionally they're
>>>>>> necessary, but usually they can be removed. For example
>>>>>>
>>>>>>     574         do {
>>>>>>     575             sleep(50);
>>>>>>     576         } while (!t.ready);
>>>>> Yes, agree with you. But as this piece of code is legacy code, and does
>>>>> not
>>>>> cause any testing issue, so I choose to keep it.
>>>> Well, yes, but you are introducing a new sleep(100).
>>>>
>>>> My own experience is that 100ms is never long enough to wait for any
>>>> other thread to do something.  But this is just a drive by comment - I
>>>> don't understand this code enough to understand the purpose of the
>>>> sleep.
>>> Hi Martin,
>>>
>>> ( There is a little bit long history, it involves 3 bugs
>>> https://bugs.openjdk.java.net/browse/JDK-8151582,
>>> https://bugs.openjdk.java.net/browse/JDK-8068693,
>>> https://bugs.openjdk.java.net/browse/JDK-8153209. )
>>>
>>> No, this(sleep(100)) is not introduced by me, it's deleted in another fix
>>> where I tried to fix another issue
>>> (https://bugs.openjdk.java.net/browse/JDK-8151582,
>>> http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/88577677aec9), but seems the
>>> fix introduced a new issue(https://bugs.openjdk.java.net/browse/JDK-8153209)
>>> which never occurred before. so sleep(100) is in fact kind of an rollback
>>> rather than introducing new code.
>>> And, even if we use CountDownLatch to wait, I'm afraid it will not work
>>> exactly as expected, below is a briefing:
>>> in thread_main,
>>>    1. launch thread_2
>>>    2. wait (sleep or CountDownLatch) until thread_2 get into a blocking io
>>> operation such as reading a socket
>>>    3. call close/interrupt ...
>>> in thread 2,
>>>    1. before get to the reading operation, notify thread_main
>>>    2. get into the blocking io operation
>>> But there is still a time window between step 3 in main_thread, and step 2
>>> in thread_2, so I don't think CountDownLatch will solve the issue perfectly.
>>> As this new issue(https://bugs.openjdk.java.net/browse/JDK-8153209) never
>>> occur before, so I think the best way to fix it is to keep the original code
>>> for the new failing test cases reported in
>>> https://bugs.openjdk.java.net/browse/JDK-8153209, and introducing the new
>>> code(not calling sleep(100)) for the failing test cases reported in
>>> https://bugs.openjdk.java.net/browse/JDK-8151582.
>>>
>>> Thank you
>>> -Hamlin
>>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20160408/eefa7171/attachment.html>