flatMap
Raab, Donald
Donald.Raab at gs.com
Fri Nov 16 09:27:12 PST 2012
This is our signature in GS Collections:
<V> RichIterable<V> flatCollect(Function<? super T, ? extends Iterable<V>> function);
This is our lazy (stream) implementation:
https://github.com/goldmansachs/gs-collections/blob/master/collections/src/main/java/com/gs/collections/impl/lazy/AbstractLazyIterable.java
https://github.com/goldmansachs/gs-collections/blob/master/collections/src/main/java/com/gs/collections/impl/lazy/FlatCollectIterable.java
At the end, FlatCollectIterable (or any other lazy iterble) just delegates to forEach(), which is exactly what I am seeing all of the duplicate hand coded examples doing. Am I missing something? Can’t we just use Iterable instead of Stream or Collection and just rely on forEach()?
From: lambda-libs-spec-experts-bounces at openjdk.java.net [mailto:lambda-libs-spec-experts-bounces at openjdk.java.net] On Behalf Of Sam Pullara
Sent: Wednesday, November 14, 2012 5:57 PM
To: Joe Bowbeer
Cc: lambda-libs-spec-experts at openjdk.java.net
Subject: Re: flatMap
My last 2 cents of caring on this:
Just had a long conversation with Brian about the motivation for the current implementation and I understand it a lot better now. So the current thing in the lambda libraries has a shape more like this:
<R> Stream<R> flatMap(FlatMapper<? extends R, ? super T> mapper);
where FlatMapper is:
void flatMapInto(Block<? super R> sink, T element);
rather than the expected:
<R> Stream<R> flatMap(Mapper<T, Stream<R>> mapper);
The reason is that in many cases we won't have a stream available on the mapping side and will end up in a situation where we need to wrap each map, we always have to do work no matter what, etc. Especially since we don't have a way to easily yield a value and consume them from the outside. My suggestion is to change the the current flatMap's name to flatMapInto and implement flatMap as expected.
There are cases where the normal definition of flatMap will be more efficient (more so over time as people build APIs as streams where concatenation is cheap) and it will be the expected shape of the method for many.
Sam
On Nov 13, 2012, at 10:41 AM, Joe Bowbeer <joe.bowbeer at gmail.com<mailto:joe.bowbeer at gmail.com>> wrote:
A few notes related to analogous features:
I suspect that most Java programmers using these new features will be more familiar with similar features in Groovy than any other language.
Groovy has collect{} and flatten() but no flatCollect{} shorthand for collect{}.flatten().
One form of flatten takes a closure, by the way:
http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html#flatten(groovy.lang.Closure)
FWIW, here are examples of list flattening in many languages:
http://rosettacode.org/wiki/Flatten_a_list
--Joe
On Tue, Nov 13, 2012 at 10:07 AM, Brian Goetz <brian.goetz at oracle.com<mailto:brian.goetz at oracle.com>> wrote:
That's the obvious answer...like I said, it's just not the right answer here. (It can be a convenience layer built atop of a more flexible primitive, though.) Try writing something like "break a string into words" using the flatMap(Mapper<T,Stream<T>>), or more generally anything that naturally yields up elements one at a time.) Also the Mapper<T,Stream<T>> is going to be a lot less efficient. Like I said, it can be part of the answer, but it is absolutely the wrong primitive.
Java isn't Scala. What works in Scala libraries may well not work in Java libraries, because the language is different. So the "right" answer in one language is not necessarily the right answer in another. (Related example: if you have tuples, a method like "partition(Predicate)" is more natural than if you don't. So the partition method is a no-brainer in Scala but entails far more tradeoffs in Java, and might be on the other side of the line for that reason.)
On 11/13/2012 12:33 PM, Sam Pullara wrote:
Honestly I would have expected only a single shape for flatMap:
Stream<V> flatMap(Mapper<T, Stream<V>>)
Where the result is a concatenation of the Streams returned. Kind of
like doing a map from T to Stream<V> ending up with a Stream<Stream<V>>
and then calling flatten() on that Stream. Is this not the normal
definition? Here is the definition from Scala:
defflatMap[B](f: (A) ⇒ GenTraversableOnce
<http://www.scala-lang.org/archives/downloads/distrib/files/nightly/docs/library/scala/collection/GenTraversableOnce.html>[B]):
Traversable
<http://www.scala-lang.org/archives/downloads/distrib/files/nightly/docs/library/scala/collection/Traversable.html>[B]
[use case]
Builds a new collection by applying a function to all elements of this
traversable collection and using the elements of the resulting collections.
For example:
def getWords(lines:Seq[String]):Seq[String] = lines flatMap (line=> line split"\\W+<smb://W+> <smb://W+>")
The type of the resulting collection is guided by the static type of
traversable collection. This might cause unexpected results sometimes.
For example:
// lettersOf will return a Seq[Char] of likely repeated letters, instead of a Set
def lettersOf(words:Seq[String]) = words flatMap (word=> word.toSet)
// lettersOf will return a Set[Char], not a Seq
def lettersOf(words:Seq[String]) = words.toSet flatMap (word=> word.toSeq)
// xs will be a an Iterable[Int]
val xs =Map("a" ->List(11,111),"b" ->List(22,222)).flatMap(_._2)
// ys will be a Map[Int, Int]
val ys =Map("a" ->List(1 ->11,1 ->111),"b" ->List(2 ->22,2 ->222)).flatMap(_._2)
B
the element type of the returned collection.
f
the function to apply to each element.
returns
a new traversable collection resulting from applying the given
collection-valued function |f| to each element of this traversable
collection and concatenating the results.
Definition Classes
TraversableLike
<http://www.scala-lang.org/archives/downloads/distrib/files/nightly/docs/library/scala/collection/TraversableLike.html> →
GenTraversableLike
<http://www.scala-lang.org/archives/downloads/distrib/files/nightly/docs/library/scala/collection/GenTraversableLike.html> →
FilterMonadic
<http://www.scala-lang.org/archives/downloads/distrib/files/nightly/docs/library/scala/collection/generic/FilterMonadic.html>
Sam
On Nov 13, 2012, at 7:42 AM, Brian Goetz <brian.goetz at oracle.com<mailto:brian.goetz at oracle.com>
<mailto:brian.goetz at oracle.com<mailto:brian.goetz at oracle.com>>> wrote:
The things I found awkward using in the kata were flatMap
This is a complaint we received over and over again in the "Hack Day"
sessions -- it is pretty clear we are not there yet on flatMap.
1. It is not obvious flatMap is the best name, as it sets
expectations for Scala users that will not be met. Perhaps mapMulti?
explode?
2. The API form is definitely not there yet. But, the "obvious"
suggestion is clearly wrong. Assume for sake of argument that there
will be only one version of flatMap. What everyone thinks it should
look like (after a few seconds of thought) is:
Stream<T> flatMap(Mapper<T, Collection<T>>)
But, this is awful (except when you already happen to have a
Collection lying around, which will be sometimes but by no means all
of the time.) This forces (a) the client to create and populate a
Collection in the lambda (yuck!), usually an extra collection to be
created even when the element maps to nothing, and then forces an
iteration (with Iterator) on the other side. So, in the case where
there's no collection lying around, the client code is larger, uglier,
and in any case the whole thing is significantly slower. The current
version has a very nice property: when the element maps to nothing,
there's no work to do.
This is further challenging because of erasure. If we had:
Stream<T> flatMap(Mapper<T, Collection<T>>)
we might also want
Stream<T> flatMap(Mapper<T, T[]>)
or
Stream<T> flatMap(Mapper<T, Stream<T>>)
or
Stream<T> flatMap(Mapper<T, Streamable<T>>)
But, we can't overload these since their erasure is all "Mapper".
More information about the lambda-libs-spec-observers
mailing list