Questions about Stream/Iterable/Files - and possibly the compiler

Fri Nov 6 22:54:18 UTC 2015

> On 6 Nov 2015, at 18:17, Tagir F. Valeev <amaembo at gmail.com> wrote:
> 
> Hello!
> 
>>>> https://bugs.openjdk.java.net/browse/JDK-8141608 <https://bugs.openjdk.java.net/browse/JDK-8141608>
>>> 
>>> Thanks to Remi and Paul for the complete explanation. Concerning JDK-8141608, I lile Peter Levart's comment about making a specific Collector.
> 
> PS> There is a problem with that approach. At the moment the
> PS> Collector does not get to control whether the stream is executed in parallel or sequentially.
> 

I was wondering if someone might propose such a new Collector characteristic in conjunction with forEachOrdered :-)

I have a preference to first consider a Stream.foldLeft, and from that maybe consider a LEFT_FOLDING characteristic, with appropriate factories. But then people may ask for RIGHT_FOLDING, to which i will say, first we have to consider Stream.reverse, and then that pulls in a whole bunch of over stuff related to efficient reverse spliterators… and it goes on… :-)

For some SO examples you point out, such as indexed streams we would really like value types to do this properly to have a tuple of index + value. In other cases e.g. about preceding elements, a history-based wrapping spliterator could work (IIRC Jose Paumard has presented such examples), but we are currently lacking an SPI to plug-in operations, so one needs to directly use the Stream.spliterator escape.

Paul.

> It would actually be nice to have a special characteristic for such
> case like Collector.Characteristics.SEQUENTIAL. This would signal that
> combiner should never be used (it may throw
> UnsupportedOperationException). The implementation for such case would
> be like this (ReferencePipeline::collect):
> 
> public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
>  A container;
>  if(isParallel() &&
>     collector.characteristics().contains(Characteristics.SEQUENTIAL)) {
>       container = collector.supplier().get();
>       BiConsumer<A, ? super P_OUT> accumulator = collector.accumulator();
>       forEachOrdered(u -> accumulator.accept(container, u));
>  } else ... // existing code follows
> }
> 
> Special static methods could be added like
> Collector.ofSequential(supplier, accumulator) and
> Collector.ofSequential(supplier, accumulator, finisher). Also existing
> Collectors::groupingBy/groupingByConcurrent/partitioningBy should be
> updated to support this characteristic of downstream collector.
> 
> This is somewhat similar to the proposed foldLeft feature
> (JDK-8133680). Quite often people write Collectors which don't support
> parallel collection: either their combiners throw some exception or
> (even worse) silently produce something incorrect (like (a, b) -> a).
> See, for example:
> https://github.com/poetix/protonpack/blob/48931db/src/main/java/com/codepoetics/protonpack/collectors/CollectorUtils.java#L108
> 
> Library provides special "convenient" static method to create such
> combiner. I don't like this library at all, but people really use it.
> Also such solutions posted on StackOverflow sometimes:
> http://stackoverflow.com/a/30094831/4856258
> Shame on me, I also did this:
> http://stackoverflow.com/a/32484173/4856258
> 
> So having special characteristic such parallel-hostile combiners would
> at least work correctly for parallel stream (and user may still get
> some speedup if there are some heavy upstream operations).
> 
> Well I doubt that JDK guys would like this proposal, but the fact is
> that real world developers rarely care about parallel processing and
> just want Streams to work in sequential mode. As a result, some ugly
> code is produced like bogus combiner parameter to reduce/collect
> methods. Probably API should be more friendly to real user needs...
> 
> With best regards,
> Tagir Valeev.
>