Stream operations -- current set
Brian Goetz
brian.goetz at oracle.com
Fri Sep 14 13:56:21 PDT 2012
Here's the current set of stream operations.
Intermediate / Lazy (Stateless)
-------------------------------
Stream<T> filter(Predicate<? super T> predicate);
<R> Stream<R> map(Mapper<? super T, ? extends R> mapper);
<R> Stream<R> flatMap(FlatMapper<? super T, R> mapper);
Stream<T> tee(Block<? super T> block);
<U> MapStream<T, U> mapped(Mapper<? super T, ? extends U> mapper);
Of these, the only one where there is some controversy is over the
signature of flatMap, where the mapper takes a lambda into which the
results are pushed. Some people prefer something like
flatMap(t -> Collection<T>)
or
flatMap(t -> T[])
but I think these are mostly value-destroying. If you don't already
have an array or Collection lying around, its a lot more code/work to
construct one, and then its more work to iterate it. And if you do have
a Collection lying around, you can just do:
flatMap((b, t) -> findResult(t).forEach(b))
and so having the extra overload doesn't help you much. The existing
signature seems a better "primitive".
Intermediate / Lazy (Stateful)
------------------------------
Stream<T> uniqueElements();
Stream<T> sorted(Comparator<? super T> comparator);
Stream<T> cumulate(BinaryOperator<T> operator);
Stream<T> sequential();
Of these, we might want to add a sorted() which assumes natural ordering
and takes no Comparator, and throws CCE if the elements are not
Comparable (just like new TreeMap() does.)
We might also want a version of cumulate that takes an explicit base,
not just to deal with the "stream is empty" case (since that's easy with
an intermediate operation), but so that you can resume an existing
cumulation.
Terminal / Eager
----------------
void forEach(Block<? super T> block);
<A extends Destination<? super T>> A into(A target);
Object[] toArray();
<U> Map<U, Collection<T>> groupBy(Mapper<? super T, ? extends U>
classifier);
<U, W> Map<U, W> reduceBy(Mapper<? super T, ? extends U> classifier,
Factory<W> baseFactory,
Combiner<W, T, W> reducer);
T reduce(T base, BinaryOperator<T> op);
Optional<T> reduce(BinaryOperator<T> op);
<U> U fold(Factory<U> baseFactory,
Combiner<U, T, U> reducer,
BinaryOperator<U> combiner);
boolean anyMatch(Predicate<? super T> predicate);
boolean allMatch(Predicate<? super T> predicate);
boolean noneMatch(Predicate<? super T> predicate);
Optional<T> findFirst();
Optional<T> findAny();
Of these, there are a lot more options.
For toArray, we might want to do
interface ArrayFactory<T> {
T[] make(int size);
}
and have
T[] toArray(ArrayFactory<T>)
(the two existing versions of toArray in Collection both stink; the
no-arg one returns Object[], and the array-taking one uses reflection to
instantiate the array. Lambdas buy us out of that (we might even
consider treating Foo[]::new as a syntax for array constructor refs.)
The most controversial signature here is groupBy, because it is the only
place in the Streams API that is tied to Collections. The rationale is;
you really can't implement groupBy without having an internal Map
anyway, so why not just return that rather than making the user create a
MapStream (which has an internal Map) and then dump the elements into a
real Map with into(). But that leaves us tied to Collections I, where
I'd rather not be.
Don has suggested a multi-valued version of groupBy:
<U> Map<U, Collection<T>> groupByMulti(FlatMapper<? super T, ?
extends U> classifier);
which is easy to implement and makes sense to me.
The reduceBy method is one of my favorites. (Not sure if we have the
signature quite right yet, it probably needs multiple versions.) It is
a combination of group-by and reduce-values. So if you want to compute
the highest score by person:
Map<Name, Integer> bestScoresByPerson =
scores.reduceBy(s -> getName(),
()-> 0,
(sc, s) -> max(sc, s.getScore());
The fold() method could use a better name, but it is a generalized
parallel fold where the intermediate result could be mutable or
immutable, and there are interesting use cases in both domains.
There are a few others in the maybe-should-have list, including
limit/skip/slice. But I'd like to nail down the details of the
must-haves first.
More information about the lambda-libs-spec-experts
mailing list