Potential bug in hotspot occasionally resulting in non-termination of parallel stream execution

Paul Sandoz paul.sandoz at oracle.com
Thu Apr 23 07:46:53 UTC 2015


Hi Amy,

Thanks for doing this testing work.

We have also observed such failures with same binary runs (e.g. see those for b59) [*]. Failures are observed for linux and solaris, but not for windows. I dunno if that is significant or not.

Paul.

[*] The jtreg feature of getting a jstack trace is most helpful in this case.

On Apr 23, 2015, at 7:19 AM, Amy Lu <amy.lu at oracle.com> wrote:

> Here I’m providing test results in details.
> 
> We picked up Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, 32 processors machine, one stream test ToArrayOpTest for this testing. Normally this test takes ~22 seconds to complete. We used longer enough timeout so believe the “timeout” show in the testing is a real hang.
> 
> * JDK9/b52: 3000 runs all pass.
> * JDK9/b53: Reproduced the issue, test timed out 4 times at run #596 #978 #988 #1290 in total 1568 runs.
> 
> From the changesets that were integrated into b53 we identified JDK-8061553 as a possible cause, and tested the latest dev build:
> 
> * Latest dev build: Reproduced the issue, test timed out 4 times at run #48 #143 #1877 #2231 in total 3000 runs
> * Backout 8061553 changeset from above build: 3000 runs all pass.
> 
> The testing was done on two machines, Linux and Solaris, and got similar results. Before drill down to b52/b53, we actually also tested b55, b59 and both could reproduce the issue.
> 
> Thanks,
> Amy
> 
> On 4/23/15 12:31 AM, Paul Sandoz wrote:
>> Hi,
>> 
>> Amy and I think we have identified an issue in hotspot that only very occasionally results in non-termination of parallel stream execution. Specifically non-termination of stream fork/join tasks. Such failures, when running jtreg stream tests, manifest themselves as timeouts with jstack trace output like the following:
>> 
>> "MainThread" #23 prio=5 os_prio=0 tid=0x00007f10a4183800 nid=0x5a6e in Object.wait() [0x00007f103e2a0000]
>>   java.lang.Thread.State: BLOCKED (on object monitor)
>> 	at java.lang.Object.wait(Native Method)
>> 	at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334)
>> 	- locked <0x00000000fc1c1aa8> (a java.util.stream.Nodes$SizedCollectorTask$OfRef)
>> 	at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405)
>> 	at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
>> 	at java.util.stream.Nodes.collect(Nodes.java:325)
>> 	at java.util.stream.ReferencePipeline.evaluateToNode(ReferencePipeline.java:109)
>> 	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:564)
>> 	at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:255)
>> 	at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438)
>> 	at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:444)
>> 	at java.util.stream.StreamTestScenario$12._run(StreamTestScenario.java:144)
>> 	at java.util.stream.StreamTestScenario.run(StreamTestScenario.java:220)
>> 	at java.util.stream.OpTestCase$ExerciseDataStreamBuilder.exercise(OpTestCase.java:349)
>> 	at java.util.stream.OpTestCase.exerciseOpsMulti(OpTestCase.java:114)
>> 	at java.util.stream.OpTestCase.exerciseOpsInt(OpTestCase.java:136)
>> 	at org.openjdk.tests.java.util.stream.MapOpTest.testOps(MapOpTest.java:74)
>> 
>> i.e. a main f/j task is waiting for decedents to complete.
>> 
>> Amy has been doing a lot of testing (since the failure happens very occasionally) and can provide more details on that and the results. I will provide some specific details below.
>> 
>> By a process of elimination we could reproduce the failure in JDK 9 b53 but not in b52. From the changesets that were integrated into b53 we identified JDK-8061553 as a possible cause:
>> 
>>   Contended Locking fast enter bucket
>>   https://bugs.openjdk.java.net/browse/JDK-8061553
>>   http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/30137e7eef47
>> 
>> We tested with a latest dev build with (naturally) and without that changeset. So far we can reproduce the issue with the former, but not with the latter.
>> 
>> This indicates the changeset for JDK-8061553 is the likely cause, however i really don't know why this would be the case. Expert advice very much appreciated!
>> 
>> --
>> 
>> Separately there is another issue with Fork/Join:
>> 
>>  http://cs.oswego.edu/pipermail/concurrency-interest/2015-April/014240.html
>> 
>> At the moment i don't think the two are connected (the latter issue has been present since 8u40), but perhaps there is a combination of factors here. So we will also run some tests with a workaround:
>> 
>>   http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/ForkJoinPool.java?r1=1.240&r2=1.241
>> 
>> just to rule this out.
>> 
>> Paul.
> 



More information about the hotspot-runtime-dev mailing list