FlatMapper as GeneratorFunction

Brian Goetz brian.goetz at oracle.com
Mon Mar 4 16:09:55 PST 2013


This is a nice idea.  But, given the current state of the implementation, is going to fall short of the performance goals for the reasons you already cite.  A key goal here is that if an input element maps to no output elements, there should be no work required.  But currently we would create a generator object for each element under this approach.  



On Mar 4, 2013, at 2:55 PM, Ali Lahijani wrote:

> Dear Brian,
> 
> Here I want to describe an alternative approach to FlatMapper which I did not see being discussed on the list. Chances are that parts of it have been considered by the Experts and rejected for some reasons, but I am not aware of that.
> 
> So it seems the choice is:
> 
>  - Keep this tied to flatMap and keep it in JUS.  Advantage: makes the complicated flatMap(FlatMapper) operation easier to understand.
> 
>  - Abstract this into a general "map to multiple values and dump results into a Consumer" type, move to JUF, ane rename to something like "MultiFunction".  Advantage: more future flexibility; Disadvantage: mostly guessing about what we might want in the future.
> 
> I lean towards the first.
>  
> There is also a third option here, which I believe to have less learning overhead for programmers.
> Instead of a primitive notion of FlatMapper, one can take (Python-style) generators as primitive. This way, a FlatMapper will be nothing more than a function that returns a generator.
> 
> It has also the added benefit that no two dimensional explosion of FlatMapper.OfDoubleToLong etc. will be necessary.
> 
> A generator is anything that supports a forEach():
> 
> interface Generator<U> {
>       void forEach(Consumer<? super T> consumer);
> }
> 
> that includes Iterables and Streams, but obviously the concept is much more inclusive. (Actually it's just plain old continuation monad in disguise).
> 
> Observe that FlatMapper<T, U> is equivalent to
>     (T, Consumer<U>) -> void.
> Let's uncurry this, to get 
>     T -> (Consumer<U> -> void)
> Which is equivalently
>     T -> Generator<U>
> 
> Inside Steam, you would simply say:
>     <R> Stream<R> flatMap(Function<? super T, Generator<R>> mapper);
> 
> I feel the most compelling argument that this is the right thing to do is that it needs only one dimension of primitive specializations, IntGenerator, LongGenerator and DoubleGenerator.
> 
> But the uncurrying creates additional lambda objects which may be undesirable. The ultimate fix is automatic currying optimization, for which I am sure there is plenty of material in the literature. The JVM should ultimately be able to curry methods which immediately return a lambda. And also to transform an expression like m.map(e).forEach(sink) to something like m.map$forEach(e, sink).
> 
> Without the automatic optimization, one can create a subclass:
> 
> interface GeneratorFunction<T, R> extends Function<T, Generator<R>> {
>     default Generator<U> map(T element) {
>         return sink -> flattenInto(element, sink);
>     }
> 
>     void mapAndForEach(T element, Consumer<U> sink);
> }
> 
> Add an overload
>     <R> Stream<R> flatMap(GeneratorFunction<? super T, R> mapper);
> 
> people can choose between 
>     s.flatMap(e -> sink -> ...) and 
>     s.flatMap((e, sink) -> ...)
> 
> The latter being more performant.
> 
> This way, GeneratorFunction.mapAndForEach is essentially FlatMapper.flattenInto, only renamed to match a generalized point of view, so the 2D explosion will be back.
> Personally I would prefer Function<T, Generator<R>> over GeneratorFunction<T,R>/FlatMapper<T,R>, hoping a future JVM will fix the performance issue.
> 
> Best



More information about the lambda-dev mailing list