JDK 9 RFR of JDK-8068693/8153209: (ch) test java/nio/channels/AsyncCloseAndInterrupt.java failing

Hamlin Li huaming.li at oracle.com
Mon Apr 11 10:47:19 UTC 2016


Would any reviewer be available to help review the code change?

Thank you
-Hamlin

On 2016/4/8 10:34, Hamlin Li wrote:
> Would any reviewer be available to help review the code change?
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8068693, 
> https://bugs.openjdk.java.net/browse/JDK-8153209
> webrev: http://cr.openjdk.java.net/~mli/8068693/webrev.00/
>
> Thank you
> -Hamlin
>
> On 2016/4/6 11:43, Hamlin Li wrote:
>>
>>
>> On 2016/4/6 11:35, Martin Buchholz wrote:
>>> Thanks for the explanation.  We occasionally have essentially the same
>>> tough problem in j.u.concurrent.
>>> Sometimes you can monitor the target thread (e.g. Thread.STATE) and spin-wait.
>>> For blocking in I/O, the OS might know, but then you have system
>>> dependent code to query it.
>>> If you sleep for 100ms, it will sometimes not be enough and you will
>>> have (too many) sporadic flakes.
>>> if you sleep for 10seconds, it will almost surely be enough, but
>>> that's a very long time for a test.
>>> Sometimes the only reasonable solution is to keep retrying with ever
>>> increasing sleeps up to e.g. 10 seconds.
>>> Then the test will almost always run fast and still (almost) never flake.
>>> I'm still not your real reviewer.
>> Hi Martin,
>>
>> Thanks for the quick response. :-)
>> Yes, you are right, we need find a safe time to wait/sleep. In this 
>> case, 100 ms should be the safe one, as it has been run successfully 
>> since beginning until the fix for 
>> https://bugs.openjdk.java.net/browse/JDK-8151582 broke it, so I think 
>> it's safe for us to rollback to 100 ms for the failing test cases.
>>
>> Hi *Alan*, *Roger*, *Brian*,
>>
>> Would you mind to review the code change for these 2 bugs when you're 
>> available?
>>
>> Thank you
>> -Hamlin
>>> On Tue, Apr 5, 2016 at 7:20 PM, Hamlin Li<huaming.li at oracle.com>  wrote:
>>>> On 2016/4/6 10:01, Martin Buchholz wrote:
>>>>> On Tue, Apr 5, 2016 at 6:26 PM, Hamlin Li<huaming.li at oracle.com>  wrote:
>>>>>> Hi Martin,
>>>>>> Thanks for reviewing, please check my comments inline.
>>>>>>
>>>>>> -Hamlin
>>>>>>
>>>>>> On 2016/4/6 1:23, Martin Buchholz wrote:
>>>>>>> The use of sleeps in these tests is disturbing.  Occasionally they're
>>>>>>> necessary, but usually they can be removed. For example
>>>>>>>
>>>>>>>     574         do {
>>>>>>>     575             sleep(50);
>>>>>>>     576         } while (!t.ready);
>>>>>> Yes, agree with you. But as this piece of code is legacy code, and does
>>>>>> not
>>>>>> cause any testing issue, so I choose to keep it.
>>>>> Well, yes, but you are introducing a new sleep(100).
>>>>>
>>>>> My own experience is that 100ms is never long enough to wait for any
>>>>> other thread to do something.  But this is just a drive by comment - I
>>>>> don't understand this code enough to understand the purpose of the
>>>>> sleep.
>>>> Hi Martin,
>>>>
>>>> ( There is a little bit long history, it involves 3 bugs
>>>> https://bugs.openjdk.java.net/browse/JDK-8151582,
>>>> https://bugs.openjdk.java.net/browse/JDK-8068693,
>>>> https://bugs.openjdk.java.net/browse/JDK-8153209. )
>>>>
>>>> No, this(sleep(100)) is not introduced by me, it's deleted in another fix
>>>> where I tried to fix another issue
>>>> (https://bugs.openjdk.java.net/browse/JDK-8151582,
>>>> http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/88577677aec9), but seems the
>>>> fix introduced a new issue(https://bugs.openjdk.java.net/browse/JDK-8153209)
>>>> which never occurred before. so sleep(100) is in fact kind of an rollback
>>>> rather than introducing new code.
>>>> And, even if we use CountDownLatch to wait, I'm afraid it will not work
>>>> exactly as expected, below is a briefing:
>>>> in thread_main,
>>>>    1. launch thread_2
>>>>    2. wait (sleep or CountDownLatch) until thread_2 get into a blocking io
>>>> operation such as reading a socket
>>>>    3. call close/interrupt ...
>>>> in thread 2,
>>>>    1. before get to the reading operation, notify thread_main
>>>>    2. get into the blocking io operation
>>>> But there is still a time window between step 3 in main_thread, and step 2
>>>> in thread_2, so I don't think CountDownLatch will solve the issue perfectly.
>>>> As this new issue(https://bugs.openjdk.java.net/browse/JDK-8153209) never
>>>> occur before, so I think the best way to fix it is to keep the original code
>>>> for the new failing test cases reported in
>>>> https://bugs.openjdk.java.net/browse/JDK-8153209, and introducing the new
>>>> code(not calling sleep(100)) for the failing test cases reported in
>>>> https://bugs.openjdk.java.net/browse/JDK-8151582.
>>>>
>>>> Thank you
>>>> -Hamlin
>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20160411/9ad141b5/attachment-0001.html>


More information about the nio-dev mailing list