BiCollector

John Rose john.r.rose at oracle.com
Tue Jun 19 00:38:09 UTC 2018


On Jun 18, 2018, at 2:29 PM, Brian Goetz <Brian.Goetz at Oracle.COM> wrote:
> 
> "bisecting" sounds like it sends half the elements to one collector and half to the other …

The main bisection or splitting operation that's relevant to a stream is what
a spliterator does, so this is a concern.

Nobody has mentioned "unzipping" yet; this is a term of art which applies to streams
of tuples.  The image of a zipper is relatively clear and unambiguous, and the tradition
is pretty strong.  https://en.wikipedia.org/wiki/Convolution_(computer_science)

The thing we are looking at differs in two ways from classic "unzipping":  First, the
two collectors themselves convert the same T elements to whatever internal value
(T1, T2) is relevant.  Second, we are looking at a new terminal operation (a collector) which
consolidates the results from both of streams (a notional Stream<T1> and Stream<T2>,
if you like), rather than delivering the streams as a pair of outputs.

The classic "unzip" operation applies "fst" and "snd" (or some other conventional
set of access functions) to each T-element of the input stream.  Since we don't
have a privileged 2-tuple type (like Pair<T1,T2>) in Java, the user would need
to nominate those two functions explicitly, either by folding them into a "mapping"
on each collector, or as a utility overloading like this:

   unzipping(
		Function<? super T, T1> f1,  // defaults to identity
		Collector<? super T1, ?, R1> c1,
		Function<? super T, T2> f2,  // defaults to identity
		Collector<? super T2, ?, R2> c2,
		BiFunction<? super R1, ? super R2, ? extends R> finisher) {
     return toBoth(mapping(f1, c1), mapping(f2, c2));
  }


> "tee" might be a candidate, though it doesn't follow the `ing convention.  "teeing" sounds dumb.


"tee" sounds asymmetrical.  "diverting" or "detouring" are "*ing" words that might
express asymmetrical disposition of derivative streams.

An asymmetrical operation might be interesting if it could fork off a stream of
its own.  It would have to have a side-effecting void-producing terminal operation,
so the main (undiverted) stream could continue to progress at the top level of
the expression.

interface Stream<T> {
  default Stream<T> diverting(Consumer<Stream<T>> tee) { … }
}

values.stream().diverting(s2->s2.forEach(System.out::println)).filter(…).collect(…);

Or (and this might be a sweet spot) a symmetric stream-tee operation could
materialize two sibling streams and rejoin their results with a bifunction:

class Collectors {
  static <R1, R2, R> Stream<T> unzipping(
		Function<? super Stream<T>, R1> f1,
		Function<? super Stream<T>, R2> f2,
		BiFunction<? super R1, ? super R2, ? extends R> finisher) { … }
}

values.stream().unzipping(
		s1->s1.forEach(System.out::println),
		s2->s2.filter(…).collect(…),
		(void1, r2)->r2
		);

This would allow each "fork child" of the stream to continue to use the
Stream API instead of the more restrictive Collector operators.

Optimal code generation for forked/unzipped/teed streams would be tricky,
requiring simultaneous loop control logic for each stream.
To me that's a feature, not a bug, since hand-writing ad hoc
simultaneous loops is a pain.

My $0.02.

— John


More information about the core-libs-dev mailing list