Re: 答复: 答复: RFR for JDK-8031374: java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails Intermittently

Martin Buchholz martinrb at google.com
Mon Feb 24 02:59:01 UTC 2014


Hi Tristan,

I don't think we should change the test without a sufficiently solid
reason, which usually means reproducing the failure, even if e.g. only once
in 1000 times.  Here we don't know which queue implementation is causing
the failure, and we don't know what chunk size was being used, which might
cause the failure to be reproduced more reliably.  A single test failure
might be due to cosmic rays!  Let's leave unfixed this while it cannot be
reproduced.  You can check in my general purpose improvements to the test
from jsr166 CVS.


On Sun, Feb 23, 2014 at 6:41 PM, Tristan Yan <tristan.yan at oracle.com> wrote:

> Hi Martin
>
> This test failed once in our nightly test. So this is a real case failure.
> But unfortunately we couldn’t know more except the log we had.
>
> 10 extra seconds may need serve 1024 loop totally, also in windows
> Thread.yield() doesn’t give up CPU most of times, then we can may have the
> situation that remover is trying to remove object from a empty queue but it
> doesn’t find anything a couple of times(it doesn’t give up cpu time) then
> offer thread get cpu time. And offer thread insert a couple of elements and
> queue is full.  Offer thread tries to give up CPU but Thread.yield()
> doesn’t really give up. Let’s assume it takes 1024 loop again. And offer
> thread got timeout.  Then remover thread try to remove elements, and it
> takes another 1024 loop to get to timeout as well. So 10 seconds may need
> distribute to 2048 loop at most. Every loop has 5 ms foreach. That’s why I
> simulate this case with a Thread.sleep(20). I admit I can’t reproduce it
> without changing code, this is all theoretical analysis. But this is
> closest possible reason that I can deduce.
>
> Thank you.
>
> Tristan
>
>
>
> *发件人**:* Martin Buchholz [mailto:martinrb at google.com]
> *发送时间:* Monday, February 24, 2014 10:16 AM
>
> *收件人:* Tristan Yan
> *抄送:* core-libs-dev
> *主题:* Re: 答复: RFR for JDK-8031374:
> java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails
> Intermittently
>
>
>
> I may very well be missing something, but the actual extra timeout for
> threads to finish is 10 whole seconds, which should be long enough to
> process a single chunk, even on Windows.
>
>
>
> If you can repro this consistently, try to find out which queue
> implementation is affected.
>
>
>
> We can also shrink max queue size and chunk size to reduce time to
> traverse the queue elements.
>
>
>
> On Sun, Feb 23, 2014 at 4:23 PM, Tristan Yan <tristan.yan at oracle.com>
> wrote:
>
> Thank you for reviewing this. Martin
>
>
>
> The original bug report is here:
> https://bugs.openjdk.java.net/browse/JDK-8031374.
>
>
>
> I haven’t reproduced this bug but I can simulate to reproduce this failure
> by changing  yield() in remover thread to Thread.sleep(20).
>
> You have commented “You've completely broken the intent of this code,
> which was to poll ***occasionally*** for quitting time”. I tried to not
> changed the logic for the test. This failure comes when only 1strounds(1024 loop) wasn’t finished in given quitting time(before timeout).
> Which was 300ms. One solution is increasing default timeout as you
> suggested. But question is how big should we increase. When the test is
> really slow and could not finish 1st round(1024 loop) before timeout, I
> couldn’t figure out what’s the reason timeout value. Also this definitely
> will slow down the tests when it run in fast machine. Which is the case
> most of time. So I took other step that skip wait if test doesn't finish 1
> st round(1024 loop) before timeout. I won’t say I completely broke the
> intent of the code here because it’s rare case.(Only if the machine is slow
> and slow enough that the test doesn't finish 1st round before timeout).
>
>
>
> The reason that replace Thread.yield to Thread.sleep(0L) because David
> Holmes point out in the bug “sleep will certainly give up the CPU,
> whereas yield is not required to.” Also in other mail, David pointed I
> should use Thread.sleep(10L) instead of Thread.sleep(0L) preferably a bit
> longer as we don't know how the OS will behave if the sleep time requested
> is less than the natural resolution of the sleep mechanism. For the
> experiment I’ve done in Windows. Thread.yeild() or Thread.sleep(10L) won’t
> guarantee current thread give up the CPU. This is a hint to OS. This makes
> in windows remover and offer thread could wait to each other more more than
> other other operating system. This is also the one of the reason that can
> explain why we've seen this in windows only.
>
>
>
> Regards
>
> Tristan
>
>
>
>
>
>
>
> *发件人**:* Martin Buchholz [mailto:martinrb at google.com]
> *发送时间:* Monday, February 24, 2014 3:47 AM
> *收件人:* Tristan Yan
> *抄送:* core-libs-dev
> *主题:* Re: RFR for JDK-8031374:
> java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails
> Intermittently
>
>
>
> Hi Tristan,
>
>
>
> (thanks for working on my old crappy tests, and apologies for always
> giving you a hard time)
>
>
>
> I couldn't find the bug report.  Can you provide a URL?
>
>
>
> Thread.stop is only called in case the test has already failed (presumed
> "hung"), as a last resort.
>
>
>
> Perhaps the timeout used in the test (300 ms) is too small on some systems?
>
>
>
> +            protected void giveupCPU(){
>
> +                try {
>
> +                    Thread.sleep(0L);
>
> +                } catch (InterruptedException ignore) {}
>
>              }
>
>
>
> I know of no reason why Thread.sleep(0) should be any more effective than
> Thread.yield.  If it was more effective, then why wouldn't Thread.yield on
> that platform be fixed to be implemented exactly the same way?  IOW if this
> is a problem, fix the JDK!
>
>
>
>              /** Polls occasionally for quitting time. */
>
>              protected boolean quittingTime(long i) {
>
> -                return (i % 1024) == 0 && quittingTime();
>
> +                return stopRequest || quittingTime() && (i % 1024 == 0 || i < 1024);
>
> +            }
>
>
>
> You've completely broken the intent of this code, which was to poll
> ***occasionally*** for quitting time.
>
>
>
>
>
> On Sun, Feb 23, 2014 at 1:40 AM, Tristan Yan <tristan.yan at oracle.com>
> wrote:
>
> Hi All
> Could you please help to review fix for JDK-803174.
>
> http://cr.openjdk.java.net/~tyan/JDK-8031374/webrev.00/
>
> Description:
> There are a couple of code change for this code.
> 1. Original code has used known problematic Thread.stop. Which could cause
> ThreadDeath as all we know. I change it to a customized stopThread method.
> 2. Test failed in windows, I analyze the failure by changing
> Thread.yield() in remover thread to Thread.sleep(50). This is a simulation
> for slow machine. Which shows exact same failures as we can see in bug
> description. By adding more debug info, we can see although we tried to
> give up CPU by using Thread.yield().(I have tried to use Thread.sleep(1L)
> as well, the result is same), there is no guarantee that os will pick up a
> new thread to execute. In Windows, this is more obvious. Because the
> execution is slow. even when the timeout happens, offer thread and remove
> thread won’t have chance to get the point that i % 1024 == 0. This will
> cause the failure like we see in the log. My fix here is when the timeout
> happens, but i is still less than 1024. Stop offer thread and remover
> thread right away instead letting them continuously wait the point to i ==
> 1024.
> 3. I replace Thread.yield to Thread.sleep(0L). I saw a couple of
> discussion that Thread.yield is not required to give up CPU.
>
> Thank you
> Tristan
>
>
>
>
>



More information about the core-libs-dev mailing list