Design for collections upgrades (was: Re: lambda-dev Digest, Vol 15, Issue 20 [reduce method result type])
Brian Goetz
brian.goetz at oracle.com
Tue Mar 8 09:23:07 PST 2011
Since people are already discussing this based on an experimental
checkin, let me outline the big picture plan here.
The general idea is to add functional-like operations to collections --
filter, map, reduce, apply.
I see three sensible modes, with explicit choices of which you get.
1. Serial / Eager. This is the straight
collections-with-functional-style mode, and some samples have already
been checked in as proof of concept. Operations on collections yield
new collections, and you can chain the calls. It values ease of use
over performance (no new concepts like laziness), but the performance
model is still highly predictable. You get things like
Collection fooAbles = things.filter( #{ t -> t.isFoo() });
or, with method references:
Collection fooAbles = things.filter(#Thing.isFoo); // ooh, pretty
You can also chain calls together, though you pay a (predictable)
performance cost of intermediate collections, which for small
collections is unlikely to matter:
maxFooWeight = things.filter(#Thing.isFoo)
.map(#Thing.getWeight)
.max();
The benefit here is concision and clarity. The cost is some
performance, but maybe not so much that people freak out. If people
care, they move to the next model, which is:
2. Serial / Lazy. Here, the primary abstraction is Stream (name to be
chosen later, Remi used "lazy" in his example.) To transfer between
"eager world" and "lazy world", you use conversion methods (toStream /
toCollection). A typical call chain probably looks like:
collection.toStream / op / op / op / {toCollection,reduce,apply}
so the above example becomes
maxFooWeight = things.asStream()
.filter(#Thing.isFoo)
.map(#Thing.getWeight)
.max();
The return type of Collection.filter is different from the return type
of Stream.filter, so the choice and performance costs are reflected in
the static type system. This avoids the cost of the intermediate
collections, but is still serial. If you care about that, you move up
to the next model, which is:
3. Parallel / Lazy. Here, the primary abstraction is something like
ParallelStream or ParallelIterable. Let's call it ParallelFoo to avoid
bikeshedding for the moment. Now, the code looks like:
maxFooWeight = things.asParallelFoo()
.filter(#Thing.isFoo)
.map(#Thing.getWeight)
.max();
Again, the return type of ParallelFoo.filter is different from
Stream.filter or Collection.filter, so again the choice is reflected in
the static type system. But you don't have to rewrite your code.
The beauty here is twofold:
- The base model (serial/eager) is easy to understand and natural to
use as a way of expressing what the programmer wants to do, and
attractive enough to stand on its own -- just a little slow with big
collections.
- Switching between execution models is mostly a matter of adding an
explicit conversion or two in the call chain, as the models are similar
enough that the rest of the code should still work (and even mean the
same thing.)
On 3/8/2011 8:43 AM, Rémi Forax wrote:
> Le 08/03/2011 14:31, Jim Mayer a écrit :
>> // I can tolerate this one
>> long product(List<Integer> list) {
>> return list.map(#{x -> (long) x}).reduce(0L, #{sum, x -> sum * x});
>> }
>
> I prefer this one:
>
> long product(List<Integer> list) {
> return list.lazy().map(#{x -> (long) x}).reduce(0L, #{sum, x -> sum * x});
> }
>
> lazy() means, don't do map directly, but wait and do map and reduce in
> one iteration.
>
> Rémi
>
>
More information about the lambda-dev
mailing list