Design for collections upgrades

Tue Mar 8 11:54:25 PST 2011

Or just overload filter with a second argument that is the collection you would like to add to. What is nice about the lazy collections is that you don't need to generate intermediate collections at all and can much more efficiently chain them. Perhaps make them all lazy and require a .to() at the end if you want a Collection vs a Stream?

> Collection<String>  fooAbles = things.filter(#Thing.isFoo).to(new HashSet<>());

Sam

On Mar 8, 2011, at 11:42 AM, Rémi Forax wrote:

> 
>     List<String>  things = ...
>     Collection<String>  fooAbles = things.filter(#Thing.isFoo); // ooh, pretty
> 
> 
> Not that pretty because filter have to create a new collection and
> there is no way to do that apart hard coding a new ArrayList somewhere.
> 
> It's better in my opinion to have a filterTo that takes a collection
> as argument.
> 
> Collection<String>  fooAbles = things.filterTo(#Thing.isFoo, new HashSet<>());
> 
> 
> Rémi
> 
> On 03/08/2011 06:23 PM, Brian Goetz wrote:
>> Since people are already discussing this based on an experimental
>> checkin, let me outline the big picture plan here.
>> 
>> The general idea is to add functional-like operations to collections --
>> filter, map, reduce, apply.
>> 
>> I see three sensible modes, with explicit choices of which you get.
>> 
>> 1.  Serial / Eager.  This is the straight
>> collections-with-functional-style mode, and some samples have already
>> been checked in as proof of concept.  Operations on collections yield
>> new collections, and you can chain the calls.  It values ease of use
>> over performance (no new concepts like laziness), but the performance
>> model is still highly predictable.  You get things like
>> 
>>       Collection fooAbles = things.filter( #{ t ->  t.isFoo() });
>> 
>> or, with method references:
>> 
>>       Collection fooAbles = things.filter(#Thing.isFoo); // ooh, pretty
>> 
>> You can also chain calls together, though you pay a (predictable)
>> performance cost of intermediate collections, which for small
>> collections is unlikely to matter:
>> 
>>       maxFooWeight = things.filter(#Thing.isFoo)
>>                            .map(#Thing.getWeight)
>>                            .max();
>> 
>> The benefit here is concision and clarity.  The cost is some
>> performance, but maybe not so much that people freak out.  If people
>> care, they move to the next model, which is:
>> 
>> 2.  Serial / Lazy.  Here, the primary abstraction is Stream (name to be
>> chosen later, Remi used "lazy" in his example.)  To transfer between
>> "eager world" and "lazy world", you use conversion methods (toStream /
>> toCollection).  A typical call chain probably looks like:
>>    collection.toStream / op / op / op / {toCollection,reduce,apply}
>> 
>> so the above example becomes
>> 
>>       maxFooWeight = things.asStream()
>>                            .filter(#Thing.isFoo)
>>                            .map(#Thing.getWeight)
>>                            .max();
>> 
>> The return type of Collection.filter is different from the return type
>> of Stream.filter, so the choice and performance costs are reflected in
>> the static type system.  This avoids the cost of the intermediate
>> collections, but is still serial.  If you care about that, you move up
>> to the next model, which is:
>> 
>> 3.  Parallel / Lazy.  Here, the primary abstraction is something like
>> ParallelStream or ParallelIterable.  Let's call it ParallelFoo to avoid
>> bikeshedding for the moment.  Now, the code looks like:
>> 
>>       maxFooWeight = things.asParallelFoo()
>>                            .filter(#Thing.isFoo)
>>                            .map(#Thing.getWeight)
>>                            .max();
>> 
>> Again, the return type of ParallelFoo.filter is different from
>> Stream.filter or Collection.filter, so again the choice is reflected in
>> the static type system.  But you don't have to rewrite your code.
>> 
>> The beauty here is twofold:
>> 
>>   - The base model (serial/eager) is easy to understand and natural to
>> use as a way of expressing what the programmer wants to do, and
>> attractive enough to stand on its own -- just a little slow with big
>> collections.
>>   - Switching between execution models is mostly a matter of adding an
>> explicit conversion or two in the call chain, as the models are similar
>> enough that the rest of the code should still work (and even mean the
>> same thing.)
>> 
>> 
>> On 3/8/2011 8:43 AM, Rémi Forax wrote:
>>>    Le 08/03/2011 14:31, Jim Mayer a écrit :
>>>> // I can tolerate this one
>>>>       long product(List<Integer>    list) {
>>>>         return list.map(#{x ->    (long) x}).reduce(0L, #{sum, x ->    sum * x});
>>>>       }
>>> I prefer this one:
>>> 
>>>     long product(List<Integer>    list) {
>>>         return list.lazy().map(#{x ->    (long) x}).reduce(0L, #{sum, x ->    sum * x});
>>>     }
>>> 
>>> lazy() means, don't do map directly, but wait and do map and reduce in
>>> one iteration.
>>> 
>>> Rémi
>>> 
>>> 
> 
>