Potential bug in hotspot occasionally resulting in non-termination of parallel stream execution
Amy Lu
amy.lu at oracle.com
Thu Apr 23 05:19:39 UTC 2015
Here I’m providing test results in details.
We picked up Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, 32 processors
machine, one stream test ToArrayOpTest for this testing. Normally this
test takes ~22 seconds to complete. We used longer enough timeout so
believe the “timeout” show in the testing is a real hang.
* JDK9/b52: 3000 runs all pass.
* JDK9/b53: Reproduced the issue, test timed out 4 times at run #596
#978 #988 #1290 in total 1568 runs.
From the changesets that were integrated into b53 we identified
JDK-8061553 as a possible cause, and tested the latest dev build:
* Latest dev build: Reproduced the issue, test timed out 4 times at run
#48 #143 #1877 #2231 in total 3000 runs
* Backout 8061553 changeset from above build: 3000 runs all pass.
The testing was done on two machines, Linux and Solaris, and got similar
results. Before drill down to b52/b53, we actually also tested b55, b59
and both could reproduce the issue.
Thanks,
Amy
On 4/23/15 12:31 AM, Paul Sandoz wrote:
> Hi,
>
> Amy and I think we have identified an issue in hotspot that only very occasionally results in non-termination of parallel stream execution. Specifically non-termination of stream fork/join tasks. Such failures, when running jtreg stream tests, manifest themselves as timeouts with jstack trace output like the following:
>
> "MainThread" #23 prio=5 os_prio=0 tid=0x00007f10a4183800 nid=0x5a6e in Object.wait() [0x00007f103e2a0000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334)
> - locked <0x00000000fc1c1aa8> (a java.util.stream.Nodes$SizedCollectorTask$OfRef)
> at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405)
> at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
> at java.util.stream.Nodes.collect(Nodes.java:325)
> at java.util.stream.ReferencePipeline.evaluateToNode(ReferencePipeline.java:109)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:564)
> at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:255)
> at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438)
> at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:444)
> at java.util.stream.StreamTestScenario$12._run(StreamTestScenario.java:144)
> at java.util.stream.StreamTestScenario.run(StreamTestScenario.java:220)
> at java.util.stream.OpTestCase$ExerciseDataStreamBuilder.exercise(OpTestCase.java:349)
> at java.util.stream.OpTestCase.exerciseOpsMulti(OpTestCase.java:114)
> at java.util.stream.OpTestCase.exerciseOpsInt(OpTestCase.java:136)
> at org.openjdk.tests.java.util.stream.MapOpTest.testOps(MapOpTest.java:74)
>
> i.e. a main f/j task is waiting for decedents to complete.
>
> Amy has been doing a lot of testing (since the failure happens very occasionally) and can provide more details on that and the results. I will provide some specific details below.
>
> By a process of elimination we could reproduce the failure in JDK 9 b53 but not in b52. From the changesets that were integrated into b53 we identified JDK-8061553 as a possible cause:
>
> Contended Locking fast enter bucket
> https://bugs.openjdk.java.net/browse/JDK-8061553
> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/30137e7eef47
>
> We tested with a latest dev build with (naturally) and without that changeset. So far we can reproduce the issue with the former, but not with the latter.
>
> This indicates the changeset for JDK-8061553 is the likely cause, however i really don't know why this would be the case. Expert advice very much appreciated!
>
> --
>
> Separately there is another issue with Fork/Join:
>
> http://cs.oswego.edu/pipermail/concurrency-interest/2015-April/014240.html
>
> At the moment i don't think the two are connected (the latter issue has been present since 8u40), but perhaps there is a combination of factors here. So we will also run some tests with a workaround:
>
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/ForkJoinPool.java?r1=1.240&r2=1.241
>
> just to rule this out.
>
> Paul.
More information about the hotspot-runtime-dev
mailing list