Join (was: Re: hg: lambda/lambda/jdk: 3 new changesets)
David Conrad
drconrad at gmail.com
Sun Jun 3 22:09:21 PDT 2012
On Wed, 30 May 2012 18:11:38 -0400, Brian Goetz <brian.goetz at oracle.com
> wrote:
> > I don't like the class StringJoiner because despite the fact it's a
> > reduce operation, it's implemented as a Fillable, so something
> > eager which will not work as is in the parallel world.
>
> That's where we started -- trying to treat string joining as a reduce.
> But because String is immutable, doing a sequential reduce with string
> concatenation becomes an O(n^2) operation, which is not good.
>
> (Originally we thought that the way to do joining was:
> stream.interleaveWith(Streams.repeating(", "))
> .reduce(String::concatenate)
> but the reduce step, which, while pretty, was inefficient.)
>
> The parallel case isn't much better. Even if you do a number of string
> joins at the leaves of the tree, the top-level combine still has to copy
> all the string content, which loses most of the parallelism.
>
> But you can still do upstream ops in parallel:
>
> collection.parallel()
> .filter(...)
> .map(...)
> .sorted()
> .sequential()
> .into(new StringJoiner(", "));
>
> and all the upstream stuff will happen in parallel.
>
> > I think it's better to add join() on Iterable.
>
> Would like to, but we can't add methods that only apply to specific
> parameterizations.
>
>
Um, I've been using the following:
package in.digo.util;
import static in.digo.func.Take.first;
import static in.digo.func.Take.rest;
import static java.util.Iterables.isEmpty;
public class Join<T> {
private Join() {
throw new AssertionError("no instances");
}
public static String join(Iterable<?> seq) {
return join(", ", seq);
}
public static String join(String separator, Iterable<?> seq) {
if (isEmpty(seq)) return "";
StringBuilder sb = new StringBuilder(first(seq).toString());
for (Object obj : rest(seq)) sb.append(separator).append(obj);
return sb.toString();
}
}
Take::first and Take::rest do what you'd expect.
This is eager, not lazy, and doesn't do anything for parallel,
but it's an order of magnitude simpler than StringJoiner and
String::join could call this, or this could just go in String,
In the parallel case, you'd just do:
join(collection.parallel()
.filter(...)
.map(...)
.sorted()
.sequential());
Or:
join(";", collection.parallel()
.filter(...)
.map(...)
.sorted()
.sequential());
If you didn't want comma-space as the separator.
List::toString could just do the following:
public interface List<E> extends Collection<E> {
...
public String toString() default {
if (isEmpty()) return "[]";
return "[" + join(this) + "]";
}
...
}
Oops, no, can't do that because a default method
can't override an Object method. So stick it in
AbstractCollection instead.
I am a bear of very little brain, and like to keep
things simple.
David
More information about the lambda-dev
mailing list