Stream operations -- current set

Brian Goetz brian.goetz at oracle.com
Sun Oct 7 10:52:21 PDT 2012


>>     <U> MapStream<T, U> mapped(Mapper<? super T, ? extends U> mapper);
>  From my experience, the opposite one is much more useful:
>      <U> MapStream<U, T> mapped(Mapper<? super T, ? extends U> mapper);
> Are we implying that in this case one should use groupBy() (create many pointless collections) and then map() to throw the collections away?

That works, but cheaper is:

   foos.mapped(mapper).swap()

which transposes keys and values (probably needs a better name.)

>> Intermediate / Lazy (Stateful)
>> ------------------------------
>>
>>     Stream<T> uniqueElements();
>>
>>     Stream<T> sorted(Comparator<? super T> comparator);
>
> I don't see how these two operations fit in here: sort() has no chance of being lazy, AFAIU, and uniqueElements() needs misleadingly much state to be stored to be considered "reasonably lazy". If I am wrong, please correct me.

They are both lazy in the sense that no significant computation happens 
when the sort() method is called.  Unlike 
filter/map/flatMap/keys/swap/etc, they are both stateful.

Sort does not disgorge the first element until all the elements have 
been seen, but it is still possible to get some laziness benefit anyway; 
for example, if you use a heap to sort, then

   foo.sorted().findFirst()

can get the first element without having sorted the whole remainder of 
the stream.  So there is still *some* laziness to be extracted.

Duplicate removal is "even more lazy", in that it can operate in strict 
one-in, one-out fashion -- but has to accumulate a lot of state to do 
so.  (Unless we know the stream to already be sorted, in which case we 
can use a one-element history buffer instead of a set of seen elements.) 
  Alternately, we can fuse sort/uniq into a single operation if they 
appear contiguously.

Also, such operations cannot be lazy in parallel, so they force a 
separate parallel pass.  (However, such passes are often 
information-creating; for example, in 
foo.filter().sorted().map().toArray(), we don't know the size after 
filtering, but we do after sorting, meaning we can fuse mapping and 
array packing into one operation.)

>>     Stream<T> cumulate(BinaryOperator<T> operator);
> Could you explain what this does?

Also known as "prefix".  Given a sequence of elements E1..En, and an 
associative operator *, computes the sequence

   E1, E1*E2, E1*E2*E3, etc.

Perhaps surprisingly, it can be computed in parallel in O(log n) time. 
This shows up in all sorts of parallel algorithms, and is often the key 
to turning an n^2 algorithm into an n*log n algorithm.

>>     T[] toArray(ArrayFactory<T>)
> I support this and suggest to get rid of Object[] toArray()

Alternately, Don's suggestion of
   T[] toArray(Class<T>)

>> Don has suggested a multi-valued version of groupBy:
>>
>>     <U> Map<U, Collection<T>> groupByMulti(FlatMapper<? super T, ? extends U> classifier);
>>
>> which is easy to implement and makes sense to me.
> A philosophical question: what is our take on how minimal we want to keep the Stream API?

Right.  This is a good example of something in the grey area.



More information about the lambda-libs-spec-experts mailing list