Re: 答复: RFR for JDK-8031374: java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails Intermittently
Martin Buchholz
martinrb at google.com
Mon Feb 24 02:03:36 UTC 2014
I'm doing some independent cleanup and checking into jsr166 CVS:
This gets rid of the Thread.stop and introduces a Semaphore that can be
waited on instead of spinning. Try to work from this new version. If you
can never reproduce the failure to verify a fix, I suggest leaving the bug
unfixed until we can. Presumably you have a Windows machine in a lab
somewhere that produced this originally?
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/jtreg/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java?view=co
Index: OfferRemoveLoops.java
===================================================================
RCS file:
/export/home/jsr166/jsr166/jsr166/src/test/jtreg/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java,v
retrieving revision 1.7
diff -u -r1.7 OfferRemoveLoops.java
--- OfferRemoveLoops.java 18 Jan 2013 04:23:28 -0000 1.7
+++ OfferRemoveLoops.java 24 Feb 2014 01:57:03 -0000
@@ -32,6 +32,7 @@
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.*;
+import static java.util.concurrent.TimeUnit.*;
@SuppressWarnings({"unchecked", "rawtypes", "deprecation"})
public class OfferRemoveLoops {
@@ -71,9 +72,10 @@
final long timeoutMillis = 10L * 1000L;
final int maxChunkSize = 1042;
final int maxQueueSize = 10 * maxChunkSize;
+ final CountDownLatch done = new CountDownLatch(3);
- /** Poor man's bounded buffer. */
- final AtomicLong approximateCount = new AtomicLong(0L);
+ /** Poor man's bounded buffer; prevents unbounded queue expansion.
*/
+ final Semaphore offers = new Semaphore(maxQueueSize);
abstract class CheckedThread extends Thread {
CheckedThread(String name) {
@@ -89,42 +91,44 @@
protected boolean quittingTime(long i) {
return (i % 1024) == 0 && quittingTime();
}
- protected abstract void realRun();
+ protected abstract void realRun() throws Exception;
public void run() {
try { realRun(); } catch (Throwable t) { unexpected(t); }
}
}
Thread offerer = new CheckedThread("offerer") {
- protected void realRun() {
- final long chunkSize = getRandom().nextInt(maxChunkSize) +
2;
+ protected void realRun() throws InterruptedException {
+ final int chunkSize = getRandom().nextInt(maxChunkSize) +
20;
long c = 0;
- for (long i = 0; ! quittingTime(i); i++) {
+ while (! quittingTime()) {
if (q.offer(Long.valueOf(c))) {
if ((++c % chunkSize) == 0) {
- approximateCount.getAndAdd(chunkSize);
- while (approximateCount.get() > maxQueueSize)
- Thread.yield();
+ offers.acquire(chunkSize);
}
} else {
Thread.yield();
- }}}};
+ }
+ }
+ done.countDown();
+ }};
Thread remover = new CheckedThread("remover") {
protected void realRun() {
- final long chunkSize = getRandom().nextInt(maxChunkSize) +
2;
+ final int chunkSize = getRandom().nextInt(maxChunkSize) +
20;
long c = 0;
- for (long i = 0; ! quittingTime(i); i++) {
+ while (! quittingTime()) {
if (q.remove(Long.valueOf(c))) {
if ((++c % chunkSize) == 0) {
- approximateCount.getAndAdd(-chunkSize);
+ offers.release(chunkSize);
}
} else {
Thread.yield();
}
}
q.clear();
- approximateCount.set(0); // Releases waiting offerer thread
+ offers.release(1<<30); // Releases waiting offerer thread
+ done.countDown();
}};
Thread scanner = new CheckedThread("scanner") {
@@ -139,18 +143,19 @@
break;
}
Thread.yield();
- }}};
+ }
+ done.countDown();
+ }};
- for (Thread thread : new Thread[] { offerer, remover, scanner }) {
- thread.join(timeoutMillis + testDurationMillis);
- if (thread.isAlive()) {
- System.err.printf("Hung thread: %s%n", thread.getName());
- failed++;
- for (StackTraceElement e : thread.getStackTrace())
- System.err.println(e);
- // Kludge alert
- thread.stop();
- thread.join(timeoutMillis);
+ if (! done.await(timeoutMillis + testDurationMillis,
MILLISECONDS)) {
+ for (Thread thread : new Thread[] { offerer, remover, scanner
}) {
+ if (thread.isAlive()) {
+ System.err.printf("Hung thread: %s%n",
thread.getName());
+ failed++;
+ for (StackTraceElement e : thread.getStackTrace())
+ System.err.println(e);
+ thread.interrupt();
+ }
}
}
}
On Sun, Feb 23, 2014 at 4:23 PM, Tristan Yan <tristan.yan at oracle.com> wrote:
> Thank you for reviewing this. Martin
>
>
>
> The original bug report is here:
> https://bugs.openjdk.java.net/browse/JDK-8031374.
>
>
>
> I haven’t reproduced this bug but I can simulate to reproduce this failure
> by changing yield() in remover thread to Thread.sleep(20).
>
> You have commented “You've completely broken the intent of this code,
> which was to poll ***occasionally*** for quitting time”. I tried to not
> changed the logic for the test. This failure comes when only 1strounds(1024 loop) wasn’t finished in given quitting time(before timeout).
> Which was 300ms. One solution is increasing default timeout as you
> suggested. But question is how big should we increase. When the test is
> really slow and could not finish 1st round(1024 loop) before timeout, I
> couldn’t figure out what’s the reason timeout value. Also this definitely
> will slow down the tests when it run in fast machine. Which is the case
> most of time. So I took other step that skip wait if test doesn't finish 1
> st round(1024 loop) before timeout. I won’t say I completely broke the
> intent of the code here because it’s rare case.(Only if the machine is slow
> and slow enough that the test doesn't finish 1st round before timeout).
>
>
>
> The reason that replace Thread.yield to Thread.sleep(0L) because David
> Holmes point out in the bug “sleep will certainly give up the CPU,
> whereas yield is not required to.” Also in other mail, David pointed I
> should use Thread.sleep(10L) instead of Thread.sleep(0L) preferably a bit
> longer as we don't know how the OS will behave if the sleep time requested
> is less than the natural resolution of the sleep mechanism. For the
> experiment I’ve done in Windows. Thread.yeild() or Thread.sleep(10L) won’t
> guarantee current thread give up the CPU. This is a hint to OS. This makes
> in windows remover and offer thread could wait to each other more more than
> other other operating system. This is also the one of the reason that can
> explain why we've seen this in windows only.
>
>
>
> Regards
>
> Tristan
>
>
>
>
>
>
>
> *发件人**:* Martin Buchholz [mailto:martinrb at google.com]
> *发送时间:* Monday, February 24, 2014 3:47 AM
> *收件人:* Tristan Yan
> *抄送:* core-libs-dev
> *主题:* Re: RFR for JDK-8031374:
> java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails
> Intermittently
>
>
>
> Hi Tristan,
>
>
>
> (thanks for working on my old crappy tests, and apologies for always
> giving you a hard time)
>
>
>
> I couldn't find the bug report. Can you provide a URL?
>
>
>
> Thread.stop is only called in case the test has already failed (presumed
> "hung"), as a last resort.
>
>
>
> Perhaps the timeout used in the test (300 ms) is too small on some systems?
>
>
>
> + protected void giveupCPU(){
>
> + try {
>
> + Thread.sleep(0L);
>
> + } catch (InterruptedException ignore) {}
>
> }
>
>
>
> I know of no reason why Thread.sleep(0) should be any more effective than
> Thread.yield. If it was more effective, then why wouldn't Thread.yield on
> that platform be fixed to be implemented exactly the same way? IOW if this
> is a problem, fix the JDK!
>
>
>
> /** Polls occasionally for quitting time. */
>
> protected boolean quittingTime(long i) {
>
> - return (i % 1024) == 0 && quittingTime();
>
> + return stopRequest || quittingTime() && (i % 1024 == 0 || i < 1024);
>
> + }
>
>
>
> You've completely broken the intent of this code, which was to poll
> ***occasionally*** for quitting time.
>
>
>
>
>
> On Sun, Feb 23, 2014 at 1:40 AM, Tristan Yan <tristan.yan at oracle.com>
> wrote:
>
> Hi All
> Could you please help to review fix for JDK-803174.
>
> http://cr.openjdk.java.net/~tyan/JDK-8031374/webrev.00/
>
> Description:
> There are a couple of code change for this code.
> 1. Original code has used known problematic Thread.stop. Which could cause
> ThreadDeath as all we know. I change it to a customized stopThread method.
> 2. Test failed in windows, I analyze the failure by changing
> Thread.yield() in remover thread to Thread.sleep(50). This is a simulation
> for slow machine. Which shows exact same failures as we can see in bug
> description. By adding more debug info, we can see although we tried to
> give up CPU by using Thread.yield().(I have tried to use Thread.sleep(1L)
> as well, the result is same), there is no guarantee that os will pick up a
> new thread to execute. In Windows, this is more obvious. Because the
> execution is slow. even when the timeout happens, offer thread and remove
> thread won’t have chance to get the point that i % 1024 == 0. This will
> cause the failure like we see in the log. My fix here is when the timeout
> happens, but i is still less than 1024. Stop offer thread and remover
> thread right away instead letting them continuously wait the point to i ==
> 1024.
> 3. I replace Thread.yield to Thread.sleep(0L). I saw a couple of
> discussion that Thread.yield is not required to give up CPU.
>
> Thank you
> Tristan
>
>
>
More information about the core-libs-dev
mailing list