Streams design strawman

Sun Apr 22 10:04:15 PDT 2012

The more general design principle that we were appealing to is: 
collections are all about storing values, and the set of operations you 
have to support on a collection is large.  But it is silly to use a 
collection as the intermediate value between every operation -- that is 
wasteful.  For example, we could have had filter and map return new 
collections, and written things like this:

   Collection<Name> filtered = names.filter(...);
   Collection<String> mapped = names.map(n -> n.getLastName());
   mapped.sort(...);

But creating the intermediate collections is usually wasteful.  So 
instead, filter/map return streams:

   SortedSet<String> result = names.filter(...)
                                   .map(Name::getLastName)
                                   .into(new SortedSet<>());

Which gives the same final result, but more efficiently and (IMO) more 
cleanly.

The key observation is: most bulk operations on collections can be 
expressed in the form

   source - lazy - lazy - lazy - eager

where the "eager" operations are things like forEach, dump the results 
into a collection, or some form of reduce.

Grouping might sometimes be the last element in the processing, but very 
often we want to keep going.  Expressing it as something that produces a 
stream makes it easier to keep going.  Grouping may benefit less from 
laziness than filtering, but treating it as a lazy (stream-producing) 
operation also has benefits.

Our model is that the methods that produce new streams can be lazy, and 
those that produce concrete results (scalars, collections, etc) are eager.

On 4/22/2012 12:55 PM, Brian Goetz wrote:
>> So basically it's not a stream but something like this:
>>
>> interface Histogram<K,V> {
>> Iterable<K> keys();
>> Iterable<V> values();
>> Iterable<Entry<K,V>> entries();
>> }
>>
>> a kind of super type of a Map.
>
> It certainly could be, if we wanted to make it an eager
> (end-of-stream-pipeline) operation. But it seems more flexible to make
> it a BiStream-creating operation (even though the values need to be
> internally buffered, which I think is your underlying point), because
> then you can keep going with more transformations / reductions on the
> resulting BiStream. For example, the following produces a Map<Integer,
> String>, where the keys are word lengths and the values are strings of
> "word,word,word".
>
> words.groupBy(w -> w.length())
> .mapValues((length, words) -> String.join(words))
> .into(new HashMap<Integer, String>);
>
> The group-by operation is rarely the end of what you want to do; usually
> you want to count, post-process, etc.
>