Implementation update: serial and parallel stream operations

Mon Oct 22 12:28:14 PDT 2012

We've been doing a bunch of refactoring in the streams implementation, 
here's an update of where we are with the implementations of the Stream 
operations.

     Stream<T> filter(Predicate<? super T> predicate);
     <R> Stream<R> map(Mapper<? super T, ? extends R> mapper);
     <R> Stream<R> flatMap(FlatMapper<? super T, R> mapper);
     void forEach(Block<? super T> block);
     Stream<T> tee(Block<? super T> block);
     <A extends Destination<? super T>> A into(A target);
     T reduce(T base, BinaryOperator<T> op);
     Optional<T> reduce(BinaryOperator<T> op);
     <U> U fold(Factory<U> baseFactory,
                Combiner<U, T, U> reducer,
                BinaryOperator<U> combiner);
     boolean anyMatch(Predicate<? super T> predicate);
     boolean allMatch(Predicate<? super T> predicate);
     boolean noneMatch(Predicate<? super T> predicate);
     Stream<T> sequential();

Full serial and parallel implementations.

     Stream<T> uniqueElements();

HashSet-based serial implementation.  This can be optimized in the case 
of known sorted streams, where we can maintain a single-element history 
buffer rather than a HashSet.

ConcurrentHashMap-based parallel implementation.  However, this has the 
potential defect that it does not maintain encounter order (for streams 
that have a defined encounter order.)  This may or may not be a 
requirement.

     Stream<T> sorted(Comparator<? super T> comparator);

PriorityQueue-based serial implementation.  No parallel implementation.

     Stream<T> cumulate(BinaryOperator<T> operator);

Full serial and parallel implementations.

     Stream<T> limit(int n);
     Stream<T> skip(int n);

Serial implementations only.

     Stream<T> concat(Stream<? extends T> other);

Design of concat() is still under discussion.

     Object[] toArray();

Serial and parallel implementations.  Still some work to be done to 
expose all the possible size-based optimizations in the parallel case, 
especially when fusing stateful operations with toArray (e.g., 
s.sorted().toArray()).

     <U> Map<U, Collection<T>> groupBy(Mapper<? super T, ? extends U> 
classifier);

Serial and parallel implementations.  Same questions about 
order-preservation as with removeDuplicates.

     <U, W> Map<U, W> reduceBy(Mapper<? super T, ? extends U> classifier,
                               Factory<W> baseFactory,
                               Combiner<W, T, W> reducer);

Serial implementation only.

     Optional<T> findFirst();
     Optional<T> findAny();

Serial implementations only.

So, implementation checklist:

  - Need parallel implementations for find{First,Any}, sorted(), limit, 
skip, reduceBy.
  - Further implementation work needed for sorted(), toArray(), 
uniqueElements()
  - Need semantic clarification regarding preservation of encounter 
order for uniqueElements, groupBy, reduceBy