Design for collections upgrades

Sat Mar 12 14:01:44 PST 2011

>From personal experience, students are completely bewildered that
someList.map(someClosure) has absolutely nothing at all to do with maps (in
the java.util.Map sense). 'transform' is a better name. I'm not sure which
SQL term is a good fit for map. "select" doesn't really seem to cover it,
though I'm guessing that's what Llewellyn intended.

Immutable maps are not part of java's type system (at least, rt.jar). Java's
type system leans heavily towards mutable collections (i.e. the basic Map,
Set, Collection and List include mutating methods, and none of them have
methods that return new collection instances based on old instances. For
example, no such interface has an 'append' method, or a not-in-place
retainAll/removeAll).

Thus, as Stephen Colebourne said, consistency says that these methods should
run in-place, even if that doesn't fit the expectation of those who are used
to functional programming. It wouldn't make any sense for List to have NO
.append() method, which returns a new list containing the same elements as
this list plus a new element added at the end, while it DOES have a
.filter() method which returns a new list containing all elements that pass
the supplied filter closure. This does mean that i.e. .transform() HAS to
transform from one type to the same type.

.filter() is easily named: retainAll and removeAll are already in the API
and need merely be overloaded with the signature that takes a closure
instead of another collection. Just like retain and remove, these would
definitely be in-place. That leaves 'filter' free for some future immutable
collection API that returns a new instance.

.map() / .transform() is the true problem here: Not only is naming this one
difficult, its also annoying to do in-place, as an in-place transform
requires the mapping to go from ? super E to ? extends E; you can't map from
String to Integer with such a construct. Nevertheless, including a
non-in-place version of transform() as a defender method just doesn't feel
consistent, but adding the full set of immutable 'mutators' (append,
prepend, a non-in-place take on removeAll/retainAll/addAll, a non-in-place
variant of removeAll/retainAll, in-place transform and non-in-place
transform, and more) - that's a huge undertaking and adds a large bunch of
new methods to the collections API, all of which has to be written with
defender methods so as to stay backwards compatible, and they all suffer
from the issue of not returning the same type of collection as the starting
type. For some types this is irrelevant, but i.e. a LinkedHashMap returning
a plain HashMap is definitely sub-optimal. Letting each collection type use
i.e. .newInstance() to help with this also doesn't sound like a good idea;
for starters not every conceivable subclass of Map/List/etc can be
effectively cloned by re-filling it in the same order as its iterator()
method supplies values, and it would create a large timespan in which some
collections do return 'Self' for .newInstance(), but others fall back on
HashMap. If this idea is persued, .clone() or more likely a newly introduced
and defenderized .copy() should be used instead of .newInstance().

Thus, I don't think adding non-in-place versions of anything is a feasible
idea for the existing Collections API; there are far too many issues.

Instead, I suggest (perhaps for JDK8) an entirely new type tree is made for
immutable collections, including the complete set of immutable not-in-place
versions of retain/remove/appendAll/append/prepend/etc. As it would be new
API none of these methods would need to be defender methods and thus each
implementation can be responsible for returning the appropriate type.

In the mean time, a utility method in Collections which filters / transforms
existing collections into a new collection of a preset known type (HashSet,
ArrayList, and LinkedHashMap seem like the most obvious choices here) can be
added to at least let programmers do a String->Integer mapping using just
the classes in JDK7's rt.jar. Such a method would have to be named
appropriately to minimize confusion (as Collections.sort() would do an
in-place sort, but Collections.transform() would produce a new ArrayList,
for example. Possibly a new utility class needs to be created to separate
these two modes.

 --Reinier Zwitserloot

On Thu, Mar 10, 2011 at 3:18 PM, Seamus Sullivan <ssullivan at aptima.com>wrote:

> Regarding 'map' - this should be a very frequently used method; it seems
> likely that there will be a bit of confusion between the interface and the
> method in discussions. I would suggest using C++'s standard template library
> name for the similar function: 'transform' to avoid conflating the two.
>
> Regarding altering the original collection: this seems like a bad idea.
>  From my experience using the STL's remove functions, I had to frequently
> make temporary copies of collections prior to filtering (remove_if) so that
> collections wouldn't be altered for further/different processing at later
> stages.  Yes, after a while I should have caught on and created my own
> versions of remove_* that did just that.
>
> Seamus
>
> -----Original Message-----
> From: lambda-dev-bounces at openjdk.java.net [mailto:
> lambda-dev-bounces at openjdk.java.net] On Behalf Of Stephen Colebourne
> Sent: Thursday, March 10, 2011 6:58 AM
> To: lambda-dev
> Subject: Re: Design for collections upgrades
>
> The point I was trying to make is that to me the method name "filter"
> means filter this collection in place, not return a new copy that is
> filtered. Just like Collections.sort, shuffle or swap. Its the active
> tense that implies it.
>
> I also think that Java is a mutation based language in developers
> minds, and if you asked around then that in-place change is what would
> be expected. Now, the FP viewpoint suggests thats a bad idea, and the
> presence of immutable collections makes it hard to implement, but that
> doesn't change what I perceive developers would currently expect from
> that method name and similar actve tense ones.
>
> Finally, I'd argue against using "map" as a method name in Java, given
> the strong connection to the entirely different concept of the Map
> interface.
>
> Stephen
>
>
> 2011/3/10 "Zdeněk Troníček" <tronicek at fit.cvut.cz>:
> > To me it seems logical that filter() returns the same collection as was
> > the original collection. For Set you do not have any other choice either:
> >
> > set.filter(predicate)
> >
> > cannot switch from HashSet to TreeSet or back.
> >
> > Z.
> > --
> > Zdenek Tronicek
> > FIT CTU in Prague
> >
> >
> > Rémi Forax napsal(a):
> >>
> >>      List<String>  things = ...
> >>      Collection<String>  fooAbles = things.filter(#Thing.isFoo); // ooh,
> >> pretty
> >>
> >>
> >> Not that pretty because filter have to create a new collection and
> >> there is no way to do that apart hard coding a new ArrayList somewhere.
> >>
> >> It's better in my opinion to have a filterTo that takes a collection
> >> as argument.
> >>
> >> Collection<String>  fooAbles = things.filterTo(#Thing.isFoo, new
> >> HashSet<>());
> >>
> >>
> >> Rémi
> >>
> >> On 03/08/2011 06:23 PM, Brian Goetz wrote:
> >>> Since people are already discussing this based on an experimental
> >>> checkin, let me outline the big picture plan here.
> >>>
> >>> The general idea is to add functional-like operations to collections --
> >>> filter, map, reduce, apply.
> >>>
> >>> I see three sensible modes, with explicit choices of which you get.
> >>>
> >>> 1.  Serial / Eager.  This is the straight
> >>> collections-with-functional-style mode, and some samples have already
> >>> been checked in as proof of concept.  Operations on collections yield
> >>> new collections, and you can chain the calls.  It values ease of use
> >>> over performance (no new concepts like laziness), but the performance
> >>> model is still highly predictable.  You get things like
> >>>
> >>>        Collection fooAbles = things.filter( #{ t ->  t.isFoo() });
> >>>
> >>> or, with method references:
> >>>
> >>>        Collection fooAbles = things.filter(#Thing.isFoo); // ooh,
> pretty
> >>>
> >>> You can also chain calls together, though you pay a (predictable)
> >>> performance cost of intermediate collections, which for small
> >>> collections is unlikely to matter:
> >>>
> >>>        maxFooWeight = things.filter(#Thing.isFoo)
> >>>                             .map(#Thing.getWeight)
> >>>                             .max();
> >>>
> >>> The benefit here is concision and clarity.  The cost is some
> >>> performance, but maybe not so much that people freak out.  If people
> >>> care, they move to the next model, which is:
> >>>
> >>> 2.  Serial / Lazy.  Here, the primary abstraction is Stream (name to be
> >>> chosen later, Remi used "lazy" in his example.)  To transfer between
> >>> "eager world" and "lazy world", you use conversion methods (toStream /
> >>> toCollection).  A typical call chain probably looks like:
> >>>     collection.toStream / op / op / op / {toCollection,reduce,apply}
> >>>
> >>> so the above example becomes
> >>>
> >>>        maxFooWeight = things.asStream()
> >>>                             .filter(#Thing.isFoo)
> >>>                             .map(#Thing.getWeight)
> >>>                             .max();
> >>>
> >>> The return type of Collection.filter is different from the return type
> >>> of Stream.filter, so the choice and performance costs are reflected in
> >>> the static type system.  This avoids the cost of the intermediate
> >>> collections, but is still serial.  If you care about that, you move up
> >>> to the next model, which is:
> >>>
> >>> 3.  Parallel / Lazy.  Here, the primary abstraction is something like
> >>> ParallelStream or ParallelIterable.  Let's call it ParallelFoo to avoid
> >>> bikeshedding for the moment.  Now, the code looks like:
> >>>
> >>>        maxFooWeight = things.asParallelFoo()
> >>>                             .filter(#Thing.isFoo)
> >>>                             .map(#Thing.getWeight)
> >>>                             .max();
> >>>
> >>> Again, the return type of ParallelFoo.filter is different from
> >>> Stream.filter or Collection.filter, so again the choice is reflected in
> >>> the static type system.  But you don't have to rewrite your code.
> >>>
> >>> The beauty here is twofold:
> >>>
> >>>    - The base model (serial/eager) is easy to understand and natural to
> >>> use as a way of expressing what the programmer wants to do, and
> >>> attractive enough to stand on its own -- just a little slow with big
> >>> collections.
> >>>    - Switching between execution models is mostly a matter of adding an
> >>> explicit conversion or two in the call chain, as the models are similar
> >>> enough that the rest of the code should still work (and even mean the
> >>> same thing.)
> >>>
> >>>
> >>> On 3/8/2011 8:43 AM, Rémi Forax wrote:
> >>>>     Le 08/03/2011 14:31, Jim Mayer a écrit :
> >>>>> // I can tolerate this one
> >>>>>        long product(List<Integer>    list) {
> >>>>>          return list.map(#{x ->    (long) x}).reduce(0L, #{sum, x ->
> >>>>>  sum * x});
> >>>>>        }
> >>>> I prefer this one:
> >>>>
> >>>>      long product(List<Integer>    list) {
> >>>>          return list.lazy().map(#{x ->    (long) x}).reduce(0L, #{sum,
> >>>> x ->    sum * x});
> >>>>      }
> >>>>
> >>>> lazy() means, don't do map directly, but wait and do map and reduce in
> >>>> one iteration.
> >>>>
> >>>> Rémi
> >>>>
> >>>>
> >>
> >>
> >>
> >
> >
> >
>
>
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>
>