Stream operations -- current set

Brian Goetz brian.goetz at oracle.com
Sun Oct 7 12:02:59 PDT 2012


> For sort() we have to spend O(N) time to save the rest of O(N*log(N)), which isn't enough gain to have this method at all. As of memory, we don't save anything.

Its not strictly about efficiency; its about presenting a programming 
model that is consistent and effective for the users.  Sorting shows up 
in the middle of a pipeline frequently enough that moving it outside the 
model makes it less useful.

In an earlier version, where the stream methods were on Iterable rather 
than Stream, making sort a terminal operation would have been less 
disruptive than it is now, since you would have been able to keep going. 
  Now, you'd have to say:

   collection.stream().filter().sort().stream().map().forEach()
                                      ^
                                      extra internal bun

The reality is that all the stateful ops are on a slippery slope between 
intermediate and terminal; saying "sort is terminal but uniq/limit/slice 
are not" is among the least consistent choices.

>> Duplicate removal is "even more lazy", in that it can operate in strict one-in, one-out fashion -- but has to accumulate a lot of state to do so.  (Unless we know the stream to already be sorted, in which case we can use a one-element history buffer instead of a set of seen elements.)  Alternately, we can fuse sort/uniq into a single operation if they appear contiguously.
> Could you explain "fuzing" a little more?

When we construct a pipeline of operations:

   collection - filter - sorted - map - toArray

we build a linked-list representation of the stages, and evaluate it 
when we know the user wants an answer.  When we evaluate the pipeline, 
we construct a chain of iterators or sinks and pull/push the data 
through it.  Many of these operations are amenable to being combined 
into a single operation.  For example, filter+map can easily be fused 
into a single operation, even if we don't externally expose a filterMap 
operation, and this shortens the chains of iterators/sinks.  A more 
effective fusing is sort+uniq, since it eliminating a lot of 
intermediate state and computation.

>>>>     T[] toArray(ArrayFactory<T>)
>>> I support this and suggest to get rid of Object[] toArray()
>>
>> Alternately, Don's suggestion of
>>   T[] toArray(Class<T>)
> Is there a performance penalty for creating arrays through reflection?

We're evaluating whether this is always intrinsified or not.

> BTW, we could create a Destination that wraps an array and the users could retrieve it from there. This way we'd get rid of toArray() altogether, which seems appealing to me.

Yes, such wrappers exist in Arrays.



More information about the lambda-libs-spec-experts mailing list