RFR 8129120 Terminal operation properties should not be back-propagated to upstream operations
Paul Sandoz
paul.sandoz at oracle.com
Fri Jun 19 12:04:00 UTC 2015
Hi,
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8129120-flag-back-propagation/webrev/
I am a little red in the face (of the embarrassed kind) about this fix. We found an issue reported on stack overflow whereby a stream can report incorrect results. I may be wrong but this might be the first reported case in 8.
We thought we had a clever little way of back-propagating characteristics of the terminal operation upstream to optimise stream pipelines. More specifically to back-propafate the unordered characteristic up through the pipeline until a stateful short-circuiting operation was reached.
The general idea we had in mind, but never implemented, was to elide operations such as sort in the following case:
parallel().sorted().forEach(...)
There is no need to sort then parallel execution of the terminal operation does not care about order. Although one can reasonably argue it's a user issue in this case.
The current implementation approach is wrong and causes incorrect results in the following cases:
// http://stackoverflow.com/questions/30843279/stream-skip-behavior-with-unordered-terminal-operation
List<Integer> input = IntStream.rangeClosed(1, 20).collect(Collectors.toList());
Set<Integer> output = input.parallelStream().filter(x -> x > 0)
.skip(1)
.unordered()
.collect(Collectors.toSet());
// The output set may contain 1
List<Integer> input = IntStream.rangeClosed(1, 20).mapToObj(i -> new Integer(0)).
collect(Collectors.toList());
Integer v = input.get(0);
System.out.println(Integer.toHexString(System.identityHashCode(v)));
Collection<Integer> output = input.parallelStream()
.distinct()
.collect(Collectors.toSet());
Integer v1 = output.iterator().next();
System.out.println(Integer.toHexString(System.identityHashCode(v1)));
// Distinct may not be stable
Thankfully these are kind of edge cases so the damage is limited.
There are two main aspects to the webrev:
1) remove the back propagation logic, which simplifies the preparation of the pipeline for parallel execution and also cleans up a certain aspect of toArray optimisation that i had been meaning get around to (regarding how the pipeline is sliced in this case).
2) adjust the tests in UnorderedTest.
While i could of placed such tests in each corresponding op-specific test i thought it better to keep this focused, plus bulking out the op-specific tests in certain cases can increase the execution time (more specifically when certain HotSpot flags are used like -Xcomp).
Paul.
More information about the core-libs-dev
mailing list