FlatMapper as GeneratorFunction

Mon Mar 4 11:55:14 PST 2013

Dear Brian,

Here I want to describe an alternative approach to FlatMapper which I did
not see being discussed on the list. Chances are that parts of it have been
considered by the Experts and rejected for some reasons, but I am not aware
of that.

So it seems the choice is:
>
>  - Keep this tied to flatMap and keep it in JUS.  Advantage: makes the
> complicated flatMap(FlatMapper) operation easier to understand.
>
>  - Abstract this into a general "map to multiple values and dump results
> into a Consumer" type, move to JUF, ane rename to something like
> "MultiFunction".  Advantage: more future flexibility; Disadvantage: mostly
> guessing about what we might want in the future.
>
> I lean towards the first.
>

There is also a third option here, which I believe to have less learning
overhead for programmers.
Instead of a primitive notion of FlatMapper, one can take (Python-style)
generators as primitive. This way, a FlatMapper will be nothing more than a
function that returns a generator.

It has also the added benefit that no two dimensional explosion of
FlatMapper.OfDoubleToLong etc. will be necessary.

A generator is anything that supports a forEach():

interface Generator<U> {
      void forEach(Consumer<? super T> consumer);
}

that includes Iterables and Streams, but obviously the concept is much more
inclusive. (Actually it's just plain old continuation monad in disguise).

Observe that FlatMapper<T, U> is equivalent to
    (T, Consumer<U>) -> void.
Let's uncurry this, to get
    T -> (Consumer<U> -> void)
Which is equivalently
    T -> Generator<U>

Inside Steam, you would simply say:
    <R> Stream<R> flatMap(Function<? super T, Generator<R>> mapper);

I feel the most compelling argument that this is the right thing to do is
that it needs only one dimension of primitive specializations,
IntGenerator, LongGenerator and DoubleGenerator.

But the uncurrying creates additional lambda objects which may be
undesirable. The ultimate fix is automatic currying optimization, for which
I am sure there is plenty of material in the literature. The JVM should
ultimately be able to curry methods which immediately return a lambda. And
also to transform an expression like m.map(e).forEach(sink) to something
like m.map$forEach(e, sink).

Without the automatic optimization, one can create a subclass:

interface GeneratorFunction<T, R> extends Function<T, Generator<R>> {
    default Generator<U> map(T element) {
        return sink -> flattenInto(element, sink);
    }

    void mapAndForEach(T element, Consumer<U> sink);
}

Add an overload
    <R> Stream<R> flatMap(GeneratorFunction<? super T, R> mapper);

people can choose between
    s.flatMap(e -> sink -> ...) and
    s.flatMap((e, sink) -> ...)

The latter being more performant.

This way, GeneratorFunction.mapAndForEach is essentially
FlatMapper.flattenInto, only renamed to match a generalized point of view,
so the 2D explosion will be back.
Personally I would prefer Function<T, Generator<R>> over
GeneratorFunction<T,R>/FlatMapper<T,R>, hoping a future JVM will fix the
performance issue.

Best