FlatMapper as GeneratorFunction
Brian Goetz
brian.goetz at oracle.com
Mon Mar 4 16:09:55 PST 2013
This is a nice idea. But, given the current state of the implementation, is going to fall short of the performance goals for the reasons you already cite. A key goal here is that if an input element maps to no output elements, there should be no work required. But currently we would create a generator object for each element under this approach.
On Mar 4, 2013, at 2:55 PM, Ali Lahijani wrote:
> Dear Brian,
>
> Here I want to describe an alternative approach to FlatMapper which I did not see being discussed on the list. Chances are that parts of it have been considered by the Experts and rejected for some reasons, but I am not aware of that.
>
> So it seems the choice is:
>
> - Keep this tied to flatMap and keep it in JUS. Advantage: makes the complicated flatMap(FlatMapper) operation easier to understand.
>
> - Abstract this into a general "map to multiple values and dump results into a Consumer" type, move to JUF, ane rename to something like "MultiFunction". Advantage: more future flexibility; Disadvantage: mostly guessing about what we might want in the future.
>
> I lean towards the first.
>
> There is also a third option here, which I believe to have less learning overhead for programmers.
> Instead of a primitive notion of FlatMapper, one can take (Python-style) generators as primitive. This way, a FlatMapper will be nothing more than a function that returns a generator.
>
> It has also the added benefit that no two dimensional explosion of FlatMapper.OfDoubleToLong etc. will be necessary.
>
> A generator is anything that supports a forEach():
>
> interface Generator<U> {
> void forEach(Consumer<? super T> consumer);
> }
>
> that includes Iterables and Streams, but obviously the concept is much more inclusive. (Actually it's just plain old continuation monad in disguise).
>
> Observe that FlatMapper<T, U> is equivalent to
> (T, Consumer<U>) -> void.
> Let's uncurry this, to get
> T -> (Consumer<U> -> void)
> Which is equivalently
> T -> Generator<U>
>
> Inside Steam, you would simply say:
> <R> Stream<R> flatMap(Function<? super T, Generator<R>> mapper);
>
> I feel the most compelling argument that this is the right thing to do is that it needs only one dimension of primitive specializations, IntGenerator, LongGenerator and DoubleGenerator.
>
> But the uncurrying creates additional lambda objects which may be undesirable. The ultimate fix is automatic currying optimization, for which I am sure there is plenty of material in the literature. The JVM should ultimately be able to curry methods which immediately return a lambda. And also to transform an expression like m.map(e).forEach(sink) to something like m.map$forEach(e, sink).
>
> Without the automatic optimization, one can create a subclass:
>
> interface GeneratorFunction<T, R> extends Function<T, Generator<R>> {
> default Generator<U> map(T element) {
> return sink -> flattenInto(element, sink);
> }
>
> void mapAndForEach(T element, Consumer<U> sink);
> }
>
> Add an overload
> <R> Stream<R> flatMap(GeneratorFunction<? super T, R> mapper);
>
> people can choose between
> s.flatMap(e -> sink -> ...) and
> s.flatMap((e, sink) -> ...)
>
> The latter being more performant.
>
> This way, GeneratorFunction.mapAndForEach is essentially FlatMapper.flattenInto, only renamed to match a generalized point of view, so the 2D explosion will be back.
> Personally I would prefer Function<T, Generator<R>> over GeneratorFunction<T,R>/FlatMapper<T,R>, hoping a future JVM will fix the performance issue.
>
> Best
More information about the lambda-dev
mailing list