Questions about Stream/Iterable/Files / FOLDING Collector

Sat Nov 7 04:49:15 UTC 2015

Hello!

PS> I have a preference to first consider a Stream.foldLeft, and from
PS> that maybe consider a LEFT_FOLDING characteristic, with
PS> appropriate factories. But then people may ask for RIGHT_FOLDING,
PS> to which i will say, first we have to consider Stream.reverse, and
PS> then that pulls in a whole bunch of over stuff related to
PS> efficient reverse spliterators… and it goes on… :-)

I strongly believe that adding foldLeft to the API does not mean that
it's absolutely necessary to add foldRight as well at the same time.
After all, you already have findFirst, but have no findLast. According
to my experience, left-to-right ordering is much more useful than
right-to-left.

PS> For some SO examples you point out, such as indexed streams we
PS> would really like value types to do this properly to have a tuple
PS> of index + value.

Tuple-based solutions are of course popular and implemented in some
third-party libraries. That particular question is different, it was
asked to gather objects by indices from an existing stream without
knowing the source or previous steps. If you create, for example,
stream of index-value tuples, then filter it, your indices would have
a gaps. That question assumes that after any filtering the stream
elements are numbered without gaps.

PS> In other cases e.g. about preceding elements, a
PS> history-based wrapping spliterator could work (IIRC Jose Paumard
PS> has presented such examples), but we are currently lacking an SPI
PS> to plug-in operations, so one needs to directly use the Stream.spliterator escape.

Actually I wrote a spliterator in my StreamEx library which allows to
process pairs of input elements (along with primitive
specializations), of course using Stream.spliterator escape (which is
not that bad if you avoid tryAdvance call):
https://github.com/amaembo/streamex/blob/master/src/main/java/javax/util/streamex/PairSpliterator.java
Despite it's not lock-free, it greatly parallelizes and has good
overall performance, so I'm somewhat proud of it. It allows to solve
many interesting problems.

Pairwise differences:
int[] diff = IntStreamEx.of(intArray).pairMap((a, b) -> b-a).toArray();

Skip last stream element:
Stream<T> stream = StreamEx.of(input).pairMap((a, b) -> a);

Check if input is sorted:
StreamEx.of(input).pairMap(Comparable::compareTo).allMatch(r -> r <= 0);

Find first misplaced element:
StreamEx.of(input).pairMap((a, b) -> a.compareTo(b) > 0 ? b : null).nonNull().findFirst();

More complex example: sort word list and add header before each letter:
StreamEx.of(words).sorted()
    .prepend(" ") // Stream.concat(Stream.of(" "), this)
    .pairMap((a, b) -> a.charAt(0) == b.charAt(0) ? Stream.of(b) :
        Stream.of("Words starting with letter "+b.substring(0,1), b))
    .flatMap(Function.identity())
    .forEach(System.out::println);

And so on. It would be great to see such feature in JDK as well!

PS> Paul.

>> It would actually be nice to have a special characteristic for such
>> case like Collector.Characteristics.SEQUENTIAL. This would signal that
>> combiner should never be used (it may throw
>> UnsupportedOperationException). The implementation for such case would
>> be like this (ReferencePipeline::collect):
>> 
>> public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
>>  A container;
>>  if(isParallel() &&
>>     collector.characteristics().contains(Characteristics.SEQUENTIAL)) {
>>       container = collector.supplier().get();
>>       BiConsumer<A, ? super P_OUT> accumulator = collector.accumulator();
>>       forEachOrdered(u -> accumulator.accept(container, u));
>>  } else ... // existing code follows
>> }
>> 
>> Special static methods could be added like
>> Collector.ofSequential(supplier, accumulator) and
>> Collector.ofSequential(supplier, accumulator, finisher). Also existing
>> Collectors::groupingBy/groupingByConcurrent/partitioningBy should be
>> updated to support this characteristic of downstream collector.
>> 
>> This is somewhat similar to the proposed foldLeft feature
>> (JDK-8133680). Quite often people write Collectors which don't support
>> parallel collection: either their combiners throw some exception or
>> (even worse) silently produce something incorrect (like (a, b) -> a).
>> See, for example:
>> https://github.com/poetix/protonpack/blob/48931db/src/main/java/com/codepoetics/protonpack/collectors/CollectorUtils.java#L108
>> 
>> Library provides special "convenient" static method to create such
>> combiner. I don't like this library at all, but people really use it.
>> Also such solutions posted on StackOverflow sometimes:
>> http://stackoverflow.com/a/30094831/4856258
>> Shame on me, I also did this:
>> http://stackoverflow.com/a/32484173/4856258
>> 
>> So having special characteristic such parallel-hostile combiners would
>> at least work correctly for parallel stream (and user may still get
>> some speedup if there are some heavy upstream operations).
>> 
>> Well I doubt that JDK guys would like this proposal, but the fact is
>> that real world developers rarely care about parallel processing and
>> just want Streams to work in sequential mode. As a result, some ugly
>> code is produced like bogus combiner parameter to reduce/collect
>> methods. Probably API should be more friendly to real user needs...
>> 
>> With best regards,
>> Tagir Valeev.
>>