Streams design strawman
Brian Goetz
brian.goetz at oracle.com
Sun Apr 22 10:04:15 PDT 2012
The more general design principle that we were appealing to is:
collections are all about storing values, and the set of operations you
have to support on a collection is large. But it is silly to use a
collection as the intermediate value between every operation -- that is
wasteful. For example, we could have had filter and map return new
collections, and written things like this:
Collection<Name> filtered = names.filter(...);
Collection<String> mapped = names.map(n -> n.getLastName());
mapped.sort(...);
But creating the intermediate collections is usually wasteful. So
instead, filter/map return streams:
SortedSet<String> result = names.filter(...)
.map(Name::getLastName)
.into(new SortedSet<>());
Which gives the same final result, but more efficiently and (IMO) more
cleanly.
The key observation is: most bulk operations on collections can be
expressed in the form
source - lazy - lazy - lazy - eager
where the "eager" operations are things like forEach, dump the results
into a collection, or some form of reduce.
Grouping might sometimes be the last element in the processing, but very
often we want to keep going. Expressing it as something that produces a
stream makes it easier to keep going. Grouping may benefit less from
laziness than filtering, but treating it as a lazy (stream-producing)
operation also has benefits.
Our model is that the methods that produce new streams can be lazy, and
those that produce concrete results (scalars, collections, etc) are eager.
On 4/22/2012 12:55 PM, Brian Goetz wrote:
>> So basically it's not a stream but something like this:
>>
>> interface Histogram<K,V> {
>> Iterable<K> keys();
>> Iterable<V> values();
>> Iterable<Entry<K,V>> entries();
>> }
>>
>> a kind of super type of a Map.
>
> It certainly could be, if we wanted to make it an eager
> (end-of-stream-pipeline) operation. But it seems more flexible to make
> it a BiStream-creating operation (even though the values need to be
> internally buffered, which I think is your underlying point), because
> then you can keep going with more transformations / reductions on the
> resulting BiStream. For example, the following produces a Map<Integer,
> String>, where the keys are word lengths and the values are strings of
> "word,word,word".
>
> words.groupBy(w -> w.length())
> .mapValues((length, words) -> String.join(words))
> .into(new HashMap<Integer, String>);
>
> The group-by operation is rarely the end of what you want to do; usually
> you want to count, post-process, etc.
>
More information about the lambda-dev
mailing list