答复: RFR for JDK-8031374: java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails Intermittently

Tristan Yan tristan.yan at oracle.com
Mon Feb 24 00:23:04 UTC 2014


Thank you for reviewing this. Martin

 

The original bug report is here: https://bugs.openjdk.java.net/browse/JDK-8031374.

 

I haven’t reproduced this bug but I can simulate to reproduce this failure by changing  yield() in remover thread to Thread.sleep(20).

You have commented “You've completely broken the intent of this code, which was to poll ***occasionally*** for quitting time”. I tried to not changed the logic for the test. This failure comes when only 1st rounds(1024 loop) wasn’t finished in given quitting time(before timeout). Which was 300ms. One solution is increasing default timeout as you suggested. But question is how big should we increase. When the test is really slow and could not finish 1st round(1024 loop) before timeout, I couldn’t figure out what’s the reason timeout value. Also this definitely will slow down the tests when it run in fast machine. Which is the case most of time. So I took other step that skip wait if test doesn't finish 1st round(1024 loop) before timeout. I won’t say I completely broke the intent of the code here because it’s rare case.(Only if the machine is slow and slow enough that the test doesn't finish 1st round before timeout).

 

The reason that replace Thread.yield to Thread.sleep(0L) because David Holmes point out in the bug “sleep will certainly give up the CPU, whereas yield is not required to.” Also in other mail, David pointed I should use Thread.sleep(10L) instead of Thread.sleep(0L) preferably a bit longer as we don't know how the OS will behave if the sleep time requested is less than the natural resolution of the sleep mechanism. For the experiment I’ve done in Windows. Thread.yeild() or Thread.sleep(10L) won’t guarantee current thread give up the CPU. This is a hint to OS. This makes in windows remover and offer thread could wait to each other more more than other other operating system. This is also the one of the reason that can explain why we've seen this in windows only.

 

Regards

Tristan

 

 

 

发件人: Martin Buchholz [mailto:martinrb at google.com] 
发送时间: Monday, February 24, 2014 3:47 AM
收件人: Tristan Yan
抄送: core-libs-dev
主题: Re: RFR for JDK-8031374: java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails Intermittently

 

Hi Tristan,

 

(thanks for working on my old crappy tests, and apologies for always giving you a hard time)

 

I couldn't find the bug report.  Can you provide a URL?

 

Thread.stop is only called in case the test has already failed (presumed "hung"), as a last resort.

 

Perhaps the timeout used in the test (300 ms) is too small on some systems?

 

+            protected void giveupCPU(){
+                try {
+                    Thread.sleep(0L);
+                } catch (InterruptedException ignore) {}
             }

 

I know of no reason why Thread.sleep(0) should be any more effective than Thread.yield.  If it was more effective, then why wouldn't Thread.yield on that platform be fixed to be implemented exactly the same way?  IOW if this is a problem, fix the JDK!

 

             /** Polls occasionally for quitting time. */
             protected boolean quittingTime(long i) {
-                return (i % 1024) == 0 && quittingTime();
+                return stopRequest || quittingTime() && (i % 1024 == 0 || i < 1024);
+            }

 

You've completely broken the intent of this code, which was to poll ***occasionally*** for quitting time.

 

 

On Sun, Feb 23, 2014 at 1:40 AM, Tristan Yan <HYPERLINK "mailto:tristan.yan at oracle.com"tristan.yan at oracle.com> wrote:

Hi All
Could you please help to review fix for JDK-803174.

http://cr.openjdk.java.net/~tyan/JDK-8031374/webrev.00/

Description:
There are a couple of code change for this code.
1. Original code has used known problematic Thread.stop. Which could cause ThreadDeath as all we know. I change it to a customized stopThread method.
2. Test failed in windows, I analyze the failure by changing Thread.yield() in remover thread to Thread.sleep(50). This is a simulation for slow machine. Which shows exact same failures as we can see in bug description. By adding more debug info, we can see although we tried to give up CPU by using Thread.yield().(I have tried to use Thread.sleep(1L) as well, the result is same), there is no guarantee that os will pick up a new thread to execute. In Windows, this is more obvious. Because the execution is slow. even when the timeout happens, offer thread and remove thread won’t have chance to get the point that i % 1024 == 0. This will cause the failure like we see in the log. My fix here is when the timeout happens, but i is still less than 1024. Stop offer thread and remover thread right away instead letting them continuously wait the point to i == 1024.
3. I replace Thread.yield to Thread.sleep(0L). I saw a couple of discussion that Thread.yield is not required to give up CPU.

Thank you
Tristan 

 



More information about the core-libs-dev mailing list