Design for collections upgrades

Seamus Sullivan ssullivan at aptima.com
Thu Mar 10 06:18:40 PST 2011


Regarding 'map' - this should be a very frequently used method; it seems likely that there will be a bit of confusion between the interface and the method in discussions. I would suggest using C++'s standard template library name for the similar function: 'transform' to avoid conflating the two.

Regarding altering the original collection: this seems like a bad idea.  From my experience using the STL's remove functions, I had to frequently make temporary copies of collections prior to filtering (remove_if) so that collections wouldn't be altered for further/different processing at later stages.  Yes, after a while I should have caught on and created my own versions of remove_* that did just that.

Seamus

-----Original Message-----
From: lambda-dev-bounces at openjdk.java.net [mailto:lambda-dev-bounces at openjdk.java.net] On Behalf Of Stephen Colebourne
Sent: Thursday, March 10, 2011 6:58 AM
To: lambda-dev
Subject: Re: Design for collections upgrades

The point I was trying to make is that to me the method name "filter"
means filter this collection in place, not return a new copy that is
filtered. Just like Collections.sort, shuffle or swap. Its the active
tense that implies it.

I also think that Java is a mutation based language in developers
minds, and if you asked around then that in-place change is what would
be expected. Now, the FP viewpoint suggests thats a bad idea, and the
presence of immutable collections makes it hard to implement, but that
doesn't change what I perceive developers would currently expect from
that method name and similar actve tense ones.

Finally, I'd argue against using "map" as a method name in Java, given
the strong connection to the entirely different concept of the Map
interface.

Stephen


2011/3/10 "Zdeněk Troníček" <tronicek at fit.cvut.cz>:
> To me it seems logical that filter() returns the same collection as was
> the original collection. For Set you do not have any other choice either:
>
> set.filter(predicate)
>
> cannot switch from HashSet to TreeSet or back.
>
> Z.
> --
> Zdenek Tronicek
> FIT CTU in Prague
>
>
> Rémi Forax napsal(a):
>>
>>      List<String>  things = ...
>>      Collection<String>  fooAbles = things.filter(#Thing.isFoo); // ooh,
>> pretty
>>
>>
>> Not that pretty because filter have to create a new collection and
>> there is no way to do that apart hard coding a new ArrayList somewhere.
>>
>> It's better in my opinion to have a filterTo that takes a collection
>> as argument.
>>
>> Collection<String>  fooAbles = things.filterTo(#Thing.isFoo, new
>> HashSet<>());
>>
>>
>> Rémi
>>
>> On 03/08/2011 06:23 PM, Brian Goetz wrote:
>>> Since people are already discussing this based on an experimental
>>> checkin, let me outline the big picture plan here.
>>>
>>> The general idea is to add functional-like operations to collections --
>>> filter, map, reduce, apply.
>>>
>>> I see three sensible modes, with explicit choices of which you get.
>>>
>>> 1.  Serial / Eager.  This is the straight
>>> collections-with-functional-style mode, and some samples have already
>>> been checked in as proof of concept.  Operations on collections yield
>>> new collections, and you can chain the calls.  It values ease of use
>>> over performance (no new concepts like laziness), but the performance
>>> model is still highly predictable.  You get things like
>>>
>>>        Collection fooAbles = things.filter( #{ t ->  t.isFoo() });
>>>
>>> or, with method references:
>>>
>>>        Collection fooAbles = things.filter(#Thing.isFoo); // ooh, pretty
>>>
>>> You can also chain calls together, though you pay a (predictable)
>>> performance cost of intermediate collections, which for small
>>> collections is unlikely to matter:
>>>
>>>        maxFooWeight = things.filter(#Thing.isFoo)
>>>                             .map(#Thing.getWeight)
>>>                             .max();
>>>
>>> The benefit here is concision and clarity.  The cost is some
>>> performance, but maybe not so much that people freak out.  If people
>>> care, they move to the next model, which is:
>>>
>>> 2.  Serial / Lazy.  Here, the primary abstraction is Stream (name to be
>>> chosen later, Remi used "lazy" in his example.)  To transfer between
>>> "eager world" and "lazy world", you use conversion methods (toStream /
>>> toCollection).  A typical call chain probably looks like:
>>>     collection.toStream / op / op / op / {toCollection,reduce,apply}
>>>
>>> so the above example becomes
>>>
>>>        maxFooWeight = things.asStream()
>>>                             .filter(#Thing.isFoo)
>>>                             .map(#Thing.getWeight)
>>>                             .max();
>>>
>>> The return type of Collection.filter is different from the return type
>>> of Stream.filter, so the choice and performance costs are reflected in
>>> the static type system.  This avoids the cost of the intermediate
>>> collections, but is still serial.  If you care about that, you move up
>>> to the next model, which is:
>>>
>>> 3.  Parallel / Lazy.  Here, the primary abstraction is something like
>>> ParallelStream or ParallelIterable.  Let's call it ParallelFoo to avoid
>>> bikeshedding for the moment.  Now, the code looks like:
>>>
>>>        maxFooWeight = things.asParallelFoo()
>>>                             .filter(#Thing.isFoo)
>>>                             .map(#Thing.getWeight)
>>>                             .max();
>>>
>>> Again, the return type of ParallelFoo.filter is different from
>>> Stream.filter or Collection.filter, so again the choice is reflected in
>>> the static type system.  But you don't have to rewrite your code.
>>>
>>> The beauty here is twofold:
>>>
>>>    - The base model (serial/eager) is easy to understand and natural to
>>> use as a way of expressing what the programmer wants to do, and
>>> attractive enough to stand on its own -- just a little slow with big
>>> collections.
>>>    - Switching between execution models is mostly a matter of adding an
>>> explicit conversion or two in the call chain, as the models are similar
>>> enough that the rest of the code should still work (and even mean the
>>> same thing.)
>>>
>>>
>>> On 3/8/2011 8:43 AM, Rémi Forax wrote:
>>>>     Le 08/03/2011 14:31, Jim Mayer a écrit :
>>>>> // I can tolerate this one
>>>>>        long product(List<Integer>    list) {
>>>>>          return list.map(#{x ->    (long) x}).reduce(0L, #{sum, x ->
>>>>>  sum * x});
>>>>>        }
>>>> I prefer this one:
>>>>
>>>>      long product(List<Integer>    list) {
>>>>          return list.lazy().map(#{x ->    (long) x}).reduce(0L, #{sum,
>>>> x ->    sum * x});
>>>>      }
>>>>
>>>> lazy() means, don't do map directly, but wait and do map and reduce in
>>>> one iteration.
>>>>
>>>> Rémi
>>>>
>>>>
>>
>>
>>
>
>
>



The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.


More information about the lambda-dev mailing list