Stream operations -- current set
Brian Goetz
brian.goetz at oracle.com
Sun Oct 7 12:02:59 PDT 2012
> For sort() we have to spend O(N) time to save the rest of O(N*log(N)), which isn't enough gain to have this method at all. As of memory, we don't save anything.
Its not strictly about efficiency; its about presenting a programming
model that is consistent and effective for the users. Sorting shows up
in the middle of a pipeline frequently enough that moving it outside the
model makes it less useful.
In an earlier version, where the stream methods were on Iterable rather
than Stream, making sort a terminal operation would have been less
disruptive than it is now, since you would have been able to keep going.
Now, you'd have to say:
collection.stream().filter().sort().stream().map().forEach()
^
extra internal bun
The reality is that all the stateful ops are on a slippery slope between
intermediate and terminal; saying "sort is terminal but uniq/limit/slice
are not" is among the least consistent choices.
>> Duplicate removal is "even more lazy", in that it can operate in strict one-in, one-out fashion -- but has to accumulate a lot of state to do so. (Unless we know the stream to already be sorted, in which case we can use a one-element history buffer instead of a set of seen elements.) Alternately, we can fuse sort/uniq into a single operation if they appear contiguously.
> Could you explain "fuzing" a little more?
When we construct a pipeline of operations:
collection - filter - sorted - map - toArray
we build a linked-list representation of the stages, and evaluate it
when we know the user wants an answer. When we evaluate the pipeline,
we construct a chain of iterators or sinks and pull/push the data
through it. Many of these operations are amenable to being combined
into a single operation. For example, filter+map can easily be fused
into a single operation, even if we don't externally expose a filterMap
operation, and this shortens the chains of iterators/sinks. A more
effective fusing is sort+uniq, since it eliminating a lot of
intermediate state and computation.
>>>> T[] toArray(ArrayFactory<T>)
>>> I support this and suggest to get rid of Object[] toArray()
>>
>> Alternately, Don's suggestion of
>> T[] toArray(Class<T>)
> Is there a performance penalty for creating arrays through reflection?
We're evaluating whether this is always intrinsified or not.
> BTW, we could create a Destination that wraps an array and the users could retrieve it from there. This way we'd get rid of toArray() altogether, which seems appealing to me.
Yes, such wrappers exist in Arrays.
More information about the lambda-libs-spec-experts
mailing list