Design for collections upgrades (was: Re: lambda-dev Digest, Vol 15, Issue 20 [reduce method result type])

Tue Mar 8 09:23:07 PST 2011

Since people are already discussing this based on an experimental 
checkin, let me outline the big picture plan here.

The general idea is to add functional-like operations to collections -- 
filter, map, reduce, apply.

I see three sensible modes, with explicit choices of which you get.

1.  Serial / Eager.  This is the straight 
collections-with-functional-style mode, and some samples have already 
been checked in as proof of concept.  Operations on collections yield 
new collections, and you can chain the calls.  It values ease of use 
over performance (no new concepts like laziness), but the performance 
model is still highly predictable.  You get things like

      Collection fooAbles = things.filter( #{ t -> t.isFoo() });

or, with method references:

      Collection fooAbles = things.filter(#Thing.isFoo); // ooh, pretty

You can also chain calls together, though you pay a (predictable) 
performance cost of intermediate collections, which for small 
collections is unlikely to matter:

      maxFooWeight = things.filter(#Thing.isFoo)
                           .map(#Thing.getWeight)
                           .max();

The benefit here is concision and clarity.  The cost is some 
performance, but maybe not so much that people freak out.  If people 
care, they move to the next model, which is:

2.  Serial / Lazy.  Here, the primary abstraction is Stream (name to be 
chosen later, Remi used "lazy" in his example.)  To transfer between 
"eager world" and "lazy world", you use conversion methods (toStream / 
toCollection).  A typical call chain probably looks like:
   collection.toStream / op / op / op / {toCollection,reduce,apply}

so the above example becomes

      maxFooWeight = things.asStream()
                           .filter(#Thing.isFoo)
                           .map(#Thing.getWeight)
                           .max();

The return type of Collection.filter is different from the return type 
of Stream.filter, so the choice and performance costs are reflected in 
the static type system.  This avoids the cost of the intermediate 
collections, but is still serial.  If you care about that, you move up 
to the next model, which is:

3.  Parallel / Lazy.  Here, the primary abstraction is something like 
ParallelStream or ParallelIterable.  Let's call it ParallelFoo to avoid 
bikeshedding for the moment.  Now, the code looks like:

      maxFooWeight = things.asParallelFoo()
                           .filter(#Thing.isFoo)
                           .map(#Thing.getWeight)
                           .max();

Again, the return type of ParallelFoo.filter is different from 
Stream.filter or Collection.filter, so again the choice is reflected in 
the static type system.  But you don't have to rewrite your code.

The beauty here is twofold:

  - The base model (serial/eager) is easy to understand and natural to 
use as a way of expressing what the programmer wants to do, and 
attractive enough to stand on its own -- just a little slow with big 
collections.
  - Switching between execution models is mostly a matter of adding an 
explicit conversion or two in the call chain, as the models are similar 
enough that the rest of the code should still work (and even mean the 
same thing.)

On 3/8/2011 8:43 AM, Rémi Forax wrote:
>    Le 08/03/2011 14:31, Jim Mayer a écrit :
>> // I can tolerate this one
>>       long product(List<Integer>   list) {
>>         return list.map(#{x ->   (long) x}).reduce(0L, #{sum, x ->   sum * x});
>>       }
>
> I prefer this one:
>
>     long product(List<Integer>   list) {
>         return list.lazy().map(#{x ->   (long) x}).reduce(0L, #{sum, x ->   sum * x});
>     }
>
> lazy() means, don't do map directly, but wait and do map and reduce in
> one iteration.
>
> Rémi
>
>