Join (was: Re: hg: lambda/lambda/jdk: 3 new changesets)

David Conrad drconrad at gmail.com
Sun Jun 3 22:09:21 PDT 2012


On Wed, 30 May 2012 18:11:38 -0400, Brian Goetz <brian.goetz at oracle.com
> wrote:

> > I don't like the class StringJoiner because despite the fact it's a
> > reduce operation,  it's implemented as a Fillable, so something
> > eager which will not work as is in the parallel world.
>
> That's where we started -- trying to treat string joining as a reduce.
> But because String is immutable, doing a sequential reduce with string
> concatenation becomes an O(n^2) operation, which is not good.
>
> (Originally we thought that the way to do joining was:
>    stream.interleaveWith(Streams.repeating(", "))
>          .reduce(String::concatenate)
> but the reduce step, which, while pretty, was inefficient.)
>
> The parallel case isn't much better.  Even if you do a number of string
> joins at the leaves of the tree, the top-level combine still has to copy
> all the string content, which loses most of the parallelism.
>
> But you can still do upstream ops in parallel:
>
>   collection.parallel()
>             .filter(...)
>             .map(...)
>             .sorted()
>             .sequential()
>             .into(new StringJoiner(", "));
>
> and all the upstream stuff will happen in parallel.
>
> > I think it's better to add join() on Iterable.
>
> Would like to, but we can't add methods that only apply to specific
> parameterizations.
>
>
Um, I've been using the following:

package in.digo.util;

import static in.digo.func.Take.first;
import static in.digo.func.Take.rest;
import static java.util.Iterables.isEmpty;

public class Join<T> {
    private Join() {
        throw new AssertionError("no instances");
    }

    public static String join(Iterable<?> seq) {
        return join(", ", seq);
    }

    public static String join(String separator, Iterable<?> seq) {
        if (isEmpty(seq)) return "";
        StringBuilder sb = new StringBuilder(first(seq).toString());
        for (Object obj : rest(seq)) sb.append(separator).append(obj);
        return sb.toString();
    }
}

Take::first and Take::rest do what you'd expect.

This is eager, not lazy, and doesn't do anything for parallel,
but it's an order of magnitude simpler than StringJoiner and
String::join could call this, or this could just go in String,

In the parallel case, you'd just do:

  join(collection.parallel()
            .filter(...)
            .map(...)
            .sorted()
            .sequential());

Or:

  join(";", collection.parallel()
            .filter(...)
            .map(...)
            .sorted()
            .sequential());

If you didn't want comma-space as the separator.
List::toString could just do the following:

public interface List<E> extends Collection<E> {
    ...

    public String toString() default {
        if (isEmpty()) return "[]";
        return "[" + join(this) + "]";
    }

    ...
}

Oops, no, can't do that because a default method
can't override an Object method. So stick it in
AbstractCollection instead.

I am a bear of very little brain, and like to keep
things simple.

David


More information about the lambda-dev mailing list