Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)

Tue Jan 15 15:56:11 PST 2013

I updated the Anagrams sample (bitbucket.org/joebowbeer/anagrams) to use
the ConcurrentCollectors that are available in the latest binary snapshot
(b73).

The parallel piece is now:

  public static Stream<Collection<String>> anagrams(Stream<String> words) {
    Map<String, Collection<String>> map = words.parallel().collectUnordered(
        ConcurrentCollectors.<String, String>groupBy(Anagrams::key));
    return map.values().parallelStream().filter(v -> v.size() > 1);
  }

And I'm now seeing some parallel speedup!

Notes:

1. "collectUnordered" is important.  If it is replaced by "collect" then
the program will appear to hang for many seconds before finally producing
output.

2. Even with -XDuseGraphInference, this will not compile without the extra "
ConcurrentCollectors.<String, String>".

In fact, the compiler throws an assertion error if one even attempts to get
by with merely a groupBy static import...

Joe

On Thu, Jan 10, 2013 at 10:28 AM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> Thanks for the suggestions Paul.
>
> After updating to jdk8lambda b72, the anagram collector is now expressed
> as:
>
>   map = r.lines().accumulate(Accumulators.<String,
> String>groupBy(Anagrams::key));
>
> Notes:
>
> 1. Adding .parallel() after lines() works, but in my tests on a dual-core
> laptop, it increased the accumulate time by an order of magnitude.
>
> 2. The Accumulators.<> type specifier is currently required, unfortunately.
>
> The complete source (all 31 lines) is at
> bitbucket.org/joebowbeer/anagrams/src/
>
> In my trials, I'm using the long list of single words from
> http://icon.shef.ac.uk/Moby/mwords.html
>
> Joe
>
> On Wed, Jan 9, 2013 at 1:15 AM, Paul Sandoz <paul.sandoz at oracle.com>wrote:
>
>>
>> On Jan 9, 2013, at 7:47 AM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
>>
>> > In the javadoc for Map.compute in b72, there is an error in the sample
>> > snippet:
>> >
>> > http://download.java.net/lambda/b72/docs/api/java/util/Map.html
>> >
>> > Attempts to compute a mapping for the specified key and its current
>> mapped
>> >> value (or null if there is no current mapping). For example, to either
>> >> create or append a String msg to a value mapping:
>> >> map.compute(key, (k, v, msg) -> (v == null) ? msg : v.concat(msg))
>> >
>> >
>> > Error: BiFunction does not accept three arguments.  In particular, msg
>> is
>> > extraneous.  It should be defined in the lexical scope?
>> >
>> >
>> > Btw, I pondered how to efficiently use compute() or merge() to simulate
>> a
>> > multi-map and I eventually gave up.
>> >
>> > I eventually wrote the following, which accumulates lists of anagrams,
>> > given a list of words:
>> >
>> > r.lines().forEach(s -> map.computeIfAbsent(key(s), k -> new
>> > ArrayList<>()).add(s));
>> >
>>
>> Have you tried using the latest groupBy functionality? e.g.:
>>
>>   Map<String, Collection<String>> anagrams =
>> r.lines().parallel().reduce(groupBy(s -> key(s)));
>>
>> It would be interesting to compare performance of reduce vs.
>> reduceUnordered and the concurrent groupBy leveraging CHM.
>>
>>
>> >
>> > Where key() returns the canonical key for a given word, and r is a
>> > BufferedReader for the dictionary file.
>> >
>> > The following line prints the lists of anagrams:
>> >
>> > map.values().stream().filter(v -> v.size() > 1).forEach(v ->
>> > System.out.println(v));
>> >
>>
>> Little style tip:
>>
>>   forEach(System.out::println)
>>
>> Paul.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130115/5456eab7/attachment.html