Design for collections upgrades

Thu Mar 10 04:19:53 PST 2011

On 03/10/11, Zdeněk Troníček wrote:
> To me it seems logical that filter() returns the same collection as was
> the original collection.

But that is rarely what you want by default. The eager result you get from filter() is usualy just an intermediate representation of some data. Think of this, for example:

ConcurrentMap cMap = ...;

Set someKeys = cMap.keys().filter(...);

Would you usually want someKeys to be backed by a separate instance of ConcurrentHashMap?

Peter

> For Set you do not have any other choice either:
> 
> set.filter(predicate)
> 
> cannot switch from HashSet to TreeSet or back.
> 
> Z.
> >
> >      List<String>  things = ...
> >      Collection<String>  fooAbles = things.filter(#Thing.isFoo); // ooh,
> > pretty
> >
> >
> > Not that pretty because filter have to create a new collection and
> > there is no way to do that apart hard coding a new ArrayList somewhere.
> >
> > It's better in my opinion to have a filterTo that takes a collection
> > as argument.
> >
> > Collection<String>  fooAbles = things.filterTo(#Thing.isFoo, new
> > HashSet<>());
> >
> >
> > Rémi
> >
> > On 03/08/2011 06:23 PM, Brian Goetz wrote:
> >> Since people are already discussing this based on an experimental
> >> checkin, let me outline the big picture plan here.
> >>
> >> The general idea is to add functional-like operations to collections --
> >> filter, map, reduce, apply.
> >>
> >> I see three sensible modes, with explicit choices of which you get.
> >>
> >> 1.  Serial / Eager.  This is the straight
> >> collections-with-functional-style mode, and some samples have already
> >> been checked in as proof of concept.  Operations on collections yield
> >> new collections, and you can chain the calls.  It values ease of use
> >> over performance (no new concepts like laziness), but the performance
> >> model is still highly predictable.  You get things like
> >>
> >>        Collection fooAbles = things.filter( #{ t ->  t.isFoo() });
> >>
> >> or, with method references:
> >>
> >>        Collection fooAbles = things.filter(#Thing.isFoo); // ooh, pretty
> >>
> >> You can also chain calls together, though you pay a (predictable)
> >> performance cost of intermediate collections, which for small
> >> collections is unlikely to matter:
> >>
> >>        maxFooWeight = things.filter(#Thing.isFoo)
> >>                             .map(#Thing.getWeight)
> >>                             .max();
> >>
> >> The benefit here is concision and clarity.  The cost is some
> >> performance, but maybe not so much that people freak out.  If people
> >> care, they move to the next model, which is:
> >>
> >> 2.  Serial / Lazy.  Here, the primary abstraction is Stream (name to be
> >> chosen later, Remi used "lazy" in his example.)  To transfer between
> >> "eager world" and "lazy world", you use conversion methods (toStream /
> >> toCollection).  A typical call chain probably looks like:
> >>     collection.toStream / op / op / op / {toCollection,reduce,apply}
> >>
> >> so the above example becomes
> >>
> >>        maxFooWeight = things.asStream()
> >>                             .filter(#Thing.isFoo)
> >>                             .map(#Thing.getWeight)
> >>                             .max();
> >>
> >> The return type of Collection.filter is different from the return type
> >> of Stream.filter, so the choice and performance costs are reflected in
> >> the static type system.  This avoids the cost of the intermediate
> >> collections, but is still serial.  If you care about that, you move up
> >> to the next model, which is:
> >>
> >> 3.  Parallel / Lazy.  Here, the primary abstraction is something like
> >> ParallelStream or ParallelIterable.  Let's call it ParallelFoo to avoid
> >> bikeshedding for the moment.  Now, the code looks like:
> >>
> >>        maxFooWeight = things.asParallelFoo()
> >>                             .filter(#Thing.isFoo)
> >>                             .map(#Thing.getWeight)
> >>                             .max();
> >>
> >> Again, the return type of ParallelFoo.filter is different from
> >> Stream.filter or Collection.filter, so again the choice is reflected in
> >> the static type system.  But you don't have to rewrite your code.
> >>
> >> The beauty here is twofold:
> >>
> >>    - The base model (serial/eager) is easy to understand and natural to
> >> use as a way of expressing what the programmer wants to do, and
> >> attractive enough to stand on its own -- just a little slow with big
> >> collections.
> >>    - Switching between execution models is mostly a matter of adding an
> >> explicit conversion or two in the call chain, as the models are similar
> >> enough that the rest of the code should still work (and even mean the
> >> same thing.)
> >>
> >>
> >> On 3/8/2011 8:43 AM, Rémi Forax wrote:
> >>>     Le 08/03/2011 14:31, Jim Mayer a écrit :
> >>>> // I can tolerate this one
> >>>>        long product(List<Integer>    list) {
> >>>>          return list.map(#{x ->    (long) x}).reduce(0L, #{sum, x ->
> >>>>  sum * x});
> >>>>        }
> >>> I prefer this one:
> >>>
> >>>      long product(List<Integer>    list) {
> >>>          return list.lazy().map(#{x ->    (long) x}).reduce(0L, #{sum,
> >>> x ->    sum * x});
> >>>      }
> >>>
> >>> lazy() means, don't do map directly, but wait and do map and reduce in
> >>> one iteration.
> >>>
> >>> Rémi
> >>>
> >>>
> >
> >
> >
> 
> 
>