Stream operations -- current set

Fri Sep 14 13:56:21 PDT 2012

Here's the current set of stream operations.

Intermediate / Lazy (Stateless)
-------------------------------

     Stream<T> filter(Predicate<? super T> predicate);

     <R> Stream<R> map(Mapper<? super T, ? extends R> mapper);

     <R> Stream<R> flatMap(FlatMapper<? super T, R> mapper);

     Stream<T> tee(Block<? super T> block);

     <U> MapStream<T, U> mapped(Mapper<? super T, ? extends U> mapper);

Of these, the only one where there is some controversy is over the 
signature of flatMap, where the mapper takes a lambda into which the 
results are pushed.  Some people prefer something like

    flatMap(t -> Collection<T>)
or
    flatMap(t -> T[])

but I think these are mostly value-destroying.  If you don't already 
have an array or Collection lying around, its a lot more code/work to 
construct one, and then its more work to iterate it.  And if you do have 
a Collection lying around, you can just do:

   flatMap((b, t) -> findResult(t).forEach(b))

and so having the extra overload doesn't help you much.  The existing 
signature seems a better "primitive".

Intermediate / Lazy (Stateful)
------------------------------

     Stream<T> uniqueElements();

     Stream<T> sorted(Comparator<? super T> comparator);

     Stream<T> cumulate(BinaryOperator<T> operator);

     Stream<T> sequential();

Of these, we might want to add a sorted() which assumes natural ordering 
and takes no Comparator, and throws CCE if the elements are not 
Comparable (just like new TreeMap() does.)

We might also want a version of cumulate that takes an explicit base, 
not just to deal with the "stream is empty" case (since that's easy with 
an intermediate operation), but so that you can resume an existing 
cumulation.

Terminal / Eager
----------------

     void forEach(Block<? super T> block);

     <A extends Destination<? super T>> A into(A target);

     Object[] toArray();

     <U> Map<U, Collection<T>> groupBy(Mapper<? super T, ? extends U> 
classifier);

     <U, W> Map<U, W> reduceBy(Mapper<? super T, ? extends U> classifier,
                               Factory<W> baseFactory,
                               Combiner<W, T, W> reducer);

     T reduce(T base, BinaryOperator<T> op);
     Optional<T> reduce(BinaryOperator<T> op);

     <U> U fold(Factory<U> baseFactory,
                Combiner<U, T, U> reducer,
                BinaryOperator<U> combiner);

     boolean anyMatch(Predicate<? super T> predicate);
     boolean allMatch(Predicate<? super T> predicate);
     boolean noneMatch(Predicate<? super T> predicate);

     Optional<T> findFirst();
     Optional<T> findAny();

Of these, there are a lot more options.

For toArray, we might want to do

interface ArrayFactory<T> {
     T[] make(int size);
}

and have

     T[] toArray(ArrayFactory<T>)

(the two existing versions of toArray in Collection both stink; the 
no-arg one returns Object[], and the array-taking one uses reflection to 
instantiate the array.  Lambdas buy us out of that (we might even 
consider treating Foo[]::new as a syntax for array constructor refs.)

The most controversial signature here is groupBy, because it is the only 
place in the Streams API that is tied to Collections.  The rationale is; 
you really can't implement groupBy without having an internal Map 
anyway, so why not just return that rather than making the user create a 
MapStream (which has an internal Map) and then dump the elements into a 
real Map with into().  But that leaves us tied to Collections I, where 
I'd rather not be.

Don has suggested a multi-valued version of groupBy:

     <U> Map<U, Collection<T>> groupByMulti(FlatMapper<? super T, ? 
extends U> classifier);

which is easy to implement and makes sense to me.

The reduceBy method is one of my favorites.  (Not sure if we have the 
signature quite right yet, it probably needs multiple versions.)  It is 
a combination of group-by and reduce-values.  So if you want to compute 
the highest score by person:

Map<Name, Integer> bestScoresByPerson =
   scores.reduceBy(s -> getName(),
                   ()-> 0,
                   (sc, s) -> max(sc, s.getScore());

The fold() method could use a better name, but it is a generalized 
parallel fold where the intermediate result could be mutable or 
immutable, and there are interesting use cases in both domains.

There are a few others in the maybe-should-have list, including 
limit/skip/slice.  But I'd like to nail down the details of the 
must-haves first.