Design for collections upgrades

Thu Mar 10 03:57:48 PST 2011

The point I was trying to make is that to me the method name "filter"
means filter this collection in place, not return a new copy that is
filtered. Just like Collections.sort, shuffle or swap. Its the active
tense that implies it.

I also think that Java is a mutation based language in developers
minds, and if you asked around then that in-place change is what would
be expected. Now, the FP viewpoint suggests thats a bad idea, and the
presence of immutable collections makes it hard to implement, but that
doesn't change what I perceive developers would currently expect from
that method name and similar actve tense ones.

Finally, I'd argue against using "map" as a method name in Java, given
the strong connection to the entirely different concept of the Map
interface.

Stephen

2011/3/10 "Zdeněk Troníček" <tronicek at fit.cvut.cz>:
> To me it seems logical that filter() returns the same collection as was
> the original collection. For Set you do not have any other choice either:
>
> set.filter(predicate)
>
> cannot switch from HashSet to TreeSet or back.
>
> Z.
> --
> Zdenek Tronicek
> FIT CTU in Prague
>
>
> Rémi Forax napsal(a):
>>
>>      List<String>  things = ...
>>      Collection<String>  fooAbles = things.filter(#Thing.isFoo); // ooh,
>> pretty
>>
>>
>> Not that pretty because filter have to create a new collection and
>> there is no way to do that apart hard coding a new ArrayList somewhere.
>>
>> It's better in my opinion to have a filterTo that takes a collection
>> as argument.
>>
>> Collection<String>  fooAbles = things.filterTo(#Thing.isFoo, new
>> HashSet<>());
>>
>>
>> Rémi
>>
>> On 03/08/2011 06:23 PM, Brian Goetz wrote:
>>> Since people are already discussing this based on an experimental
>>> checkin, let me outline the big picture plan here.
>>>
>>> The general idea is to add functional-like operations to collections --
>>> filter, map, reduce, apply.
>>>
>>> I see three sensible modes, with explicit choices of which you get.
>>>
>>> 1.  Serial / Eager.  This is the straight
>>> collections-with-functional-style mode, and some samples have already
>>> been checked in as proof of concept.  Operations on collections yield
>>> new collections, and you can chain the calls.  It values ease of use
>>> over performance (no new concepts like laziness), but the performance
>>> model is still highly predictable.  You get things like
>>>
>>>        Collection fooAbles = things.filter( #{ t ->  t.isFoo() });
>>>
>>> or, with method references:
>>>
>>>        Collection fooAbles = things.filter(#Thing.isFoo); // ooh, pretty
>>>
>>> You can also chain calls together, though you pay a (predictable)
>>> performance cost of intermediate collections, which for small
>>> collections is unlikely to matter:
>>>
>>>        maxFooWeight = things.filter(#Thing.isFoo)
>>>                             .map(#Thing.getWeight)
>>>                             .max();
>>>
>>> The benefit here is concision and clarity.  The cost is some
>>> performance, but maybe not so much that people freak out.  If people
>>> care, they move to the next model, which is:
>>>
>>> 2.  Serial / Lazy.  Here, the primary abstraction is Stream (name to be
>>> chosen later, Remi used "lazy" in his example.)  To transfer between
>>> "eager world" and "lazy world", you use conversion methods (toStream /
>>> toCollection).  A typical call chain probably looks like:
>>>     collection.toStream / op / op / op / {toCollection,reduce,apply}
>>>
>>> so the above example becomes
>>>
>>>        maxFooWeight = things.asStream()
>>>                             .filter(#Thing.isFoo)
>>>                             .map(#Thing.getWeight)
>>>                             .max();
>>>
>>> The return type of Collection.filter is different from the return type
>>> of Stream.filter, so the choice and performance costs are reflected in
>>> the static type system.  This avoids the cost of the intermediate
>>> collections, but is still serial.  If you care about that, you move up
>>> to the next model, which is:
>>>
>>> 3.  Parallel / Lazy.  Here, the primary abstraction is something like
>>> ParallelStream or ParallelIterable.  Let's call it ParallelFoo to avoid
>>> bikeshedding for the moment.  Now, the code looks like:
>>>
>>>        maxFooWeight = things.asParallelFoo()
>>>                             .filter(#Thing.isFoo)
>>>                             .map(#Thing.getWeight)
>>>                             .max();
>>>
>>> Again, the return type of ParallelFoo.filter is different from
>>> Stream.filter or Collection.filter, so again the choice is reflected in
>>> the static type system.  But you don't have to rewrite your code.
>>>
>>> The beauty here is twofold:
>>>
>>>    - The base model (serial/eager) is easy to understand and natural to
>>> use as a way of expressing what the programmer wants to do, and
>>> attractive enough to stand on its own -- just a little slow with big
>>> collections.
>>>    - Switching between execution models is mostly a matter of adding an
>>> explicit conversion or two in the call chain, as the models are similar
>>> enough that the rest of the code should still work (and even mean the
>>> same thing.)
>>>
>>>
>>> On 3/8/2011 8:43 AM, Rémi Forax wrote:
>>>>     Le 08/03/2011 14:31, Jim Mayer a écrit :
>>>>> // I can tolerate this one
>>>>>        long product(List<Integer>    list) {
>>>>>          return list.map(#{x ->    (long) x}).reduce(0L, #{sum, x ->
>>>>>  sum * x});
>>>>>        }
>>>> I prefer this one:
>>>>
>>>>      long product(List<Integer>    list) {
>>>>          return list.lazy().map(#{x ->    (long) x}).reduce(0L, #{sum,
>>>> x ->    sum * x});
>>>>      }
>>>>
>>>> lazy() means, don't do map directly, but wait and do map and reduce in
>>>> one iteration.
>>>>
>>>> Rémi
>>>>
>>>>
>>
>>
>>
>
>
>