Re: 答复: RFR for JDK-8031374: java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails Intermittently

Martin Buchholz martinrb at google.com
Mon Feb 24 02:03:36 UTC 2014


I'm doing some independent cleanup and checking into jsr166 CVS:
This gets rid of the Thread.stop and introduces a Semaphore that can be
waited on instead of spinning.  Try to work from this new version.  If you
can never reproduce the failure to verify a fix, I suggest leaving the bug
unfixed until we can.  Presumably you have a Windows machine in a lab
somewhere that produced this originally?

http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/jtreg/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java?view=co

Index: OfferRemoveLoops.java
===================================================================
RCS file:
/export/home/jsr166/jsr166/jsr166/src/test/jtreg/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java,v
retrieving revision 1.7
diff -u -r1.7 OfferRemoveLoops.java
--- OfferRemoveLoops.java 18 Jan 2013 04:23:28 -0000 1.7
+++ OfferRemoveLoops.java 24 Feb 2014 01:57:03 -0000
@@ -32,6 +32,7 @@
 import java.util.*;
 import java.util.concurrent.*;
 import java.util.concurrent.atomic.*;
+import static java.util.concurrent.TimeUnit.*;

 @SuppressWarnings({"unchecked", "rawtypes", "deprecation"})
 public class OfferRemoveLoops {
@@ -71,9 +72,10 @@
         final long timeoutMillis = 10L * 1000L;
         final int maxChunkSize = 1042;
         final int maxQueueSize = 10 * maxChunkSize;
+        final CountDownLatch done = new CountDownLatch(3);

-        /** Poor man's bounded buffer. */
-        final AtomicLong approximateCount = new AtomicLong(0L);
+        /** Poor man's bounded buffer; prevents unbounded queue expansion.
*/
+        final Semaphore offers = new Semaphore(maxQueueSize);

         abstract class CheckedThread extends Thread {
             CheckedThread(String name) {
@@ -89,42 +91,44 @@
             protected boolean quittingTime(long i) {
                 return (i % 1024) == 0 && quittingTime();
             }
-            protected abstract void realRun();
+            protected abstract void realRun() throws Exception;
             public void run() {
                 try { realRun(); } catch (Throwable t) { unexpected(t); }
             }
         }

         Thread offerer = new CheckedThread("offerer") {
-            protected void realRun() {
-                final long chunkSize = getRandom().nextInt(maxChunkSize) +
2;
+            protected void realRun() throws InterruptedException {
+                final int chunkSize = getRandom().nextInt(maxChunkSize) +
20;
                 long c = 0;
-                for (long i = 0; ! quittingTime(i); i++) {
+                while (! quittingTime()) {
                     if (q.offer(Long.valueOf(c))) {
                         if ((++c % chunkSize) == 0) {
-                            approximateCount.getAndAdd(chunkSize);
-                            while (approximateCount.get() > maxQueueSize)
-                                Thread.yield();
+                            offers.acquire(chunkSize);
                         }
                     } else {
                         Thread.yield();
-                    }}}};
+                    }
+                }
+                done.countDown();
+            }};

         Thread remover = new CheckedThread("remover") {
             protected void realRun() {
-                final long chunkSize = getRandom().nextInt(maxChunkSize) +
2;
+                final int chunkSize = getRandom().nextInt(maxChunkSize) +
20;
                 long c = 0;
-                for (long i = 0; ! quittingTime(i); i++) {
+                while (! quittingTime()) {
                     if (q.remove(Long.valueOf(c))) {
                         if ((++c % chunkSize) == 0) {
-                            approximateCount.getAndAdd(-chunkSize);
+                            offers.release(chunkSize);
                         }
                     } else {
                         Thread.yield();
                     }
                 }
                 q.clear();
-                approximateCount.set(0); // Releases waiting offerer thread
+                offers.release(1<<30);  // Releases waiting offerer thread
+                done.countDown();
             }};

         Thread scanner = new CheckedThread("scanner") {
@@ -139,18 +143,19 @@
                         break;
                     }
                     Thread.yield();
-                }}};
+                }
+                done.countDown();
+            }};

-        for (Thread thread : new Thread[] { offerer, remover, scanner }) {
-            thread.join(timeoutMillis + testDurationMillis);
-            if (thread.isAlive()) {
-                System.err.printf("Hung thread: %s%n", thread.getName());
-                failed++;
-                for (StackTraceElement e : thread.getStackTrace())
-                    System.err.println(e);
-                // Kludge alert
-                thread.stop();
-                thread.join(timeoutMillis);
+        if (! done.await(timeoutMillis + testDurationMillis,
MILLISECONDS)) {
+            for (Thread thread : new Thread[] { offerer, remover, scanner
}) {
+                if (thread.isAlive()) {
+                    System.err.printf("Hung thread: %s%n",
thread.getName());
+                    failed++;
+                    for (StackTraceElement e : thread.getStackTrace())
+                        System.err.println(e);
+                    thread.interrupt();
+                }
             }
         }
     }



On Sun, Feb 23, 2014 at 4:23 PM, Tristan Yan <tristan.yan at oracle.com> wrote:

> Thank you for reviewing this. Martin
>
>
>
> The original bug report is here:
> https://bugs.openjdk.java.net/browse/JDK-8031374.
>
>
>
> I haven’t reproduced this bug but I can simulate to reproduce this failure
> by changing  yield() in remover thread to Thread.sleep(20).
>
> You have commented “You've completely broken the intent of this code,
> which was to poll ***occasionally*** for quitting time”. I tried to not
> changed the logic for the test. This failure comes when only 1strounds(1024 loop) wasn’t finished in given quitting time(before timeout).
> Which was 300ms. One solution is increasing default timeout as you
> suggested. But question is how big should we increase. When the test is
> really slow and could not finish 1st round(1024 loop) before timeout, I
> couldn’t figure out what’s the reason timeout value. Also this definitely
> will slow down the tests when it run in fast machine. Which is the case
> most of time. So I took other step that skip wait if test doesn't finish 1
> st round(1024 loop) before timeout. I won’t say I completely broke the
> intent of the code here because it’s rare case.(Only if the machine is slow
> and slow enough that the test doesn't finish 1st round before timeout).
>
>
>
> The reason that replace Thread.yield to Thread.sleep(0L) because David
> Holmes point out in the bug “sleep will certainly give up the CPU,
> whereas yield is not required to.” Also in other mail, David pointed I
> should use Thread.sleep(10L) instead of Thread.sleep(0L) preferably a bit
> longer as we don't know how the OS will behave if the sleep time requested
> is less than the natural resolution of the sleep mechanism. For the
> experiment I’ve done in Windows. Thread.yeild() or Thread.sleep(10L) won’t
> guarantee current thread give up the CPU. This is a hint to OS. This makes
> in windows remover and offer thread could wait to each other more more than
> other other operating system. This is also the one of the reason that can
> explain why we've seen this in windows only.
>
>
>
> Regards
>
> Tristan
>
>
>
>
>
>
>
> *发件人**:* Martin Buchholz [mailto:martinrb at google.com]
> *发送时间:* Monday, February 24, 2014 3:47 AM
> *收件人:* Tristan Yan
> *抄送:* core-libs-dev
> *主题:* Re: RFR for JDK-8031374:
> java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails
> Intermittently
>
>
>
> Hi Tristan,
>
>
>
> (thanks for working on my old crappy tests, and apologies for always
> giving you a hard time)
>
>
>
> I couldn't find the bug report.  Can you provide a URL?
>
>
>
> Thread.stop is only called in case the test has already failed (presumed
> "hung"), as a last resort.
>
>
>
> Perhaps the timeout used in the test (300 ms) is too small on some systems?
>
>
>
> +            protected void giveupCPU(){
>
> +                try {
>
> +                    Thread.sleep(0L);
>
> +                } catch (InterruptedException ignore) {}
>
>              }
>
>
>
> I know of no reason why Thread.sleep(0) should be any more effective than
> Thread.yield.  If it was more effective, then why wouldn't Thread.yield on
> that platform be fixed to be implemented exactly the same way?  IOW if this
> is a problem, fix the JDK!
>
>
>
>              /** Polls occasionally for quitting time. */
>
>              protected boolean quittingTime(long i) {
>
> -                return (i % 1024) == 0 && quittingTime();
>
> +                return stopRequest || quittingTime() && (i % 1024 == 0 || i < 1024);
>
> +            }
>
>
>
> You've completely broken the intent of this code, which was to poll
> ***occasionally*** for quitting time.
>
>
>
>
>
> On Sun, Feb 23, 2014 at 1:40 AM, Tristan Yan <tristan.yan at oracle.com>
> wrote:
>
> Hi All
> Could you please help to review fix for JDK-803174.
>
> http://cr.openjdk.java.net/~tyan/JDK-8031374/webrev.00/
>
> Description:
> There are a couple of code change for this code.
> 1. Original code has used known problematic Thread.stop. Which could cause
> ThreadDeath as all we know. I change it to a customized stopThread method.
> 2. Test failed in windows, I analyze the failure by changing
> Thread.yield() in remover thread to Thread.sleep(50). This is a simulation
> for slow machine. Which shows exact same failures as we can see in bug
> description. By adding more debug info, we can see although we tried to
> give up CPU by using Thread.yield().(I have tried to use Thread.sleep(1L)
> as well, the result is same), there is no guarantee that os will pick up a
> new thread to execute. In Windows, this is more obvious. Because the
> execution is slow. even when the timeout happens, offer thread and remove
> thread won’t have chance to get the point that i % 1024 == 0. This will
> cause the failure like we see in the log. My fix here is when the timeout
> happens, but i is still less than 1024. Stop offer thread and remover
> thread right away instead letting them continuously wait the point to i ==
> 1024.
> 3. I replace Thread.yield to Thread.sleep(0L). I saw a couple of
> discussion that Thread.yield is not required to give up CPU.
>
> Thank you
> Tristan
>
>
>



More information about the core-libs-dev mailing list