Questions about Stream/Iterable/Files - and possibly the compiler
Paul Sandoz
paul.sandoz at oracle.com
Fri Nov 6 22:54:18 UTC 2015
> On 6 Nov 2015, at 18:17, Tagir F. Valeev <amaembo at gmail.com> wrote:
>
> Hello!
>
>>>> https://bugs.openjdk.java.net/browse/JDK-8141608 <https://bugs.openjdk.java.net/browse/JDK-8141608>
>>>
>>> Thanks to Remi and Paul for the complete explanation. Concerning JDK-8141608, I lile Peter Levart's comment about making a specific Collector.
>
> PS> There is a problem with that approach. At the moment the
> PS> Collector does not get to control whether the stream is executed in parallel or sequentially.
>
I was wondering if someone might propose such a new Collector characteristic in conjunction with forEachOrdered :-)
I have a preference to first consider a Stream.foldLeft, and from that maybe consider a LEFT_FOLDING characteristic, with appropriate factories. But then people may ask for RIGHT_FOLDING, to which i will say, first we have to consider Stream.reverse, and then that pulls in a whole bunch of over stuff related to efficient reverse spliterators… and it goes on… :-)
For some SO examples you point out, such as indexed streams we would really like value types to do this properly to have a tuple of index + value. In other cases e.g. about preceding elements, a history-based wrapping spliterator could work (IIRC Jose Paumard has presented such examples), but we are currently lacking an SPI to plug-in operations, so one needs to directly use the Stream.spliterator escape.
Paul.
> It would actually be nice to have a special characteristic for such
> case like Collector.Characteristics.SEQUENTIAL. This would signal that
> combiner should never be used (it may throw
> UnsupportedOperationException). The implementation for such case would
> be like this (ReferencePipeline::collect):
>
> public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
> A container;
> if(isParallel() &&
> collector.characteristics().contains(Characteristics.SEQUENTIAL)) {
> container = collector.supplier().get();
> BiConsumer<A, ? super P_OUT> accumulator = collector.accumulator();
> forEachOrdered(u -> accumulator.accept(container, u));
> } else ... // existing code follows
> }
>
> Special static methods could be added like
> Collector.ofSequential(supplier, accumulator) and
> Collector.ofSequential(supplier, accumulator, finisher). Also existing
> Collectors::groupingBy/groupingByConcurrent/partitioningBy should be
> updated to support this characteristic of downstream collector.
>
> This is somewhat similar to the proposed foldLeft feature
> (JDK-8133680). Quite often people write Collectors which don't support
> parallel collection: either their combiners throw some exception or
> (even worse) silently produce something incorrect (like (a, b) -> a).
> See, for example:
> https://github.com/poetix/protonpack/blob/48931db/src/main/java/com/codepoetics/protonpack/collectors/CollectorUtils.java#L108
>
> Library provides special "convenient" static method to create such
> combiner. I don't like this library at all, but people really use it.
> Also such solutions posted on StackOverflow sometimes:
> http://stackoverflow.com/a/30094831/4856258
> Shame on me, I also did this:
> http://stackoverflow.com/a/32484173/4856258
>
> So having special characteristic such parallel-hostile combiners would
> at least work correctly for parallel stream (and user may still get
> some speedup if there are some heavy upstream operations).
>
> Well I doubt that JDK guys would like this proposal, but the fact is
> that real world developers rarely care about parallel processing and
> just want Streams to work in sequential mode. As a result, some ugly
> code is produced like bogus combiner parameter to reduce/collect
> methods. Probably API should be more friendly to real user needs...
>
> With best regards,
> Tagir Valeev.
>
More information about the core-libs-dev
mailing list