Stream operations -- current set
Brian Goetz
brian.goetz at oracle.com
Sun Oct 7 10:52:21 PDT 2012
>> <U> MapStream<T, U> mapped(Mapper<? super T, ? extends U> mapper);
> From my experience, the opposite one is much more useful:
> <U> MapStream<U, T> mapped(Mapper<? super T, ? extends U> mapper);
> Are we implying that in this case one should use groupBy() (create many pointless collections) and then map() to throw the collections away?
That works, but cheaper is:
foos.mapped(mapper).swap()
which transposes keys and values (probably needs a better name.)
>> Intermediate / Lazy (Stateful)
>> ------------------------------
>>
>> Stream<T> uniqueElements();
>>
>> Stream<T> sorted(Comparator<? super T> comparator);
>
> I don't see how these two operations fit in here: sort() has no chance of being lazy, AFAIU, and uniqueElements() needs misleadingly much state to be stored to be considered "reasonably lazy". If I am wrong, please correct me.
They are both lazy in the sense that no significant computation happens
when the sort() method is called. Unlike
filter/map/flatMap/keys/swap/etc, they are both stateful.
Sort does not disgorge the first element until all the elements have
been seen, but it is still possible to get some laziness benefit anyway;
for example, if you use a heap to sort, then
foo.sorted().findFirst()
can get the first element without having sorted the whole remainder of
the stream. So there is still *some* laziness to be extracted.
Duplicate removal is "even more lazy", in that it can operate in strict
one-in, one-out fashion -- but has to accumulate a lot of state to do
so. (Unless we know the stream to already be sorted, in which case we
can use a one-element history buffer instead of a set of seen elements.)
Alternately, we can fuse sort/uniq into a single operation if they
appear contiguously.
Also, such operations cannot be lazy in parallel, so they force a
separate parallel pass. (However, such passes are often
information-creating; for example, in
foo.filter().sorted().map().toArray(), we don't know the size after
filtering, but we do after sorting, meaning we can fuse mapping and
array packing into one operation.)
>> Stream<T> cumulate(BinaryOperator<T> operator);
> Could you explain what this does?
Also known as "prefix". Given a sequence of elements E1..En, and an
associative operator *, computes the sequence
E1, E1*E2, E1*E2*E3, etc.
Perhaps surprisingly, it can be computed in parallel in O(log n) time.
This shows up in all sorts of parallel algorithms, and is often the key
to turning an n^2 algorithm into an n*log n algorithm.
>> T[] toArray(ArrayFactory<T>)
> I support this and suggest to get rid of Object[] toArray()
Alternately, Don's suggestion of
T[] toArray(Class<T>)
>> Don has suggested a multi-valued version of groupBy:
>>
>> <U> Map<U, Collection<T>> groupByMulti(FlatMapper<? super T, ? extends U> classifier);
>>
>> which is easy to implement and makes sense to me.
> A philosophical question: what is our take on how minimal we want to keep the Stream API?
Right. This is a good example of something in the grey area.
More information about the lambda-libs-spec-experts
mailing list