Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)

Thu Jan 10 16:24:41 PST 2013

Thanks Paul.

To investigate, I pre-read all the lines:

List<String> lines = Files.readAllLines(path, Charset.defaultCharset());

and then profiled the three variants below on a dual-core desktop:

1. map = lines.stream().accumulate(Accumulators.<String,
String>groupBy(Anagrams::key));
2. map = lines.parallelStream().accumulate(Accumulators.<String,
String>groupBy(Anagrams::key));
3. map = lines.stream().parallel().accumulate(Accumulators.<String,
String>groupBy(Anagrams::key));

Results:

1. stream

accumulate = 260ms
sort = 80ms
toLowercase = 20ms

2. parallelStream

accumulate = 1342ms
sort = 283ms
toLowercase = 108ms

3. stream().parallel()

accumulate = 1186ms
sort = 629ms
toLowercase = 826ms

This is a slow-down (5x more work), though not as dramatic as what happened
in the original case:

Originally, the expression was one of:

1. map = r.lines().accumulate(Accumulators.<String,
String>groupBy(Anagrams::key));
2. map = r.lines().parallel().accumulate(Accumulators.<String,
String>groupBy(Anagrams::key));

Results:

1. lines()

accumulate = 396ms
sort = 121ms
toLowercase = 76ms

2. lines().parallel()

accumulate = 20010ms!
sort = 801ms
toLowercase = ~ms

--Joe

On Thu, Jan 10, 2013 at 12:55 PM, Paul Sandoz <paul.sandoz at oracle.com>wrote:

>
> On Jan 10, 2013, at 7:28 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
>
> > Thanks for the suggestions Paul.
> >
> > After updating to jdk8lambda b72, the anagram collector is now expressed
> as:
> >
> >   map = r.lines().accumulate(Accumulators.<String,
> String>groupBy(Anagrams::key));
> >
> > Notes:
> >
> > 1. Adding .parallel() after lines() works, but in my tests on a
> dual-core laptop, it increased the accumulate time by an order of magnitude.
> >
>
> I would need to look at this more closely tomorrow but i suspect it is due
> to two things: 1) an terribly unbalanced tree is created (using a
> spliterator from an iterator since there is no size information); and 2)
> the map merging are both contributing to the slow down.
>
> Need to try the concurrent groupBy. Things are moving so fast i am
> forgetting if b72 has that, oh for the need of continuous builds!
>
> If there are 354,984 words then that will make a tree of at least
> 354,984/1024 leaf nodes with the ~ the same depth using the current
> spliterator from iterator technique.
>
> A simple verification is to load up the words into memory as a list and
> use that list as the source of words.
>
> Paul.
>
> > 2. The Accumulators.<> type specifier is currently required,
> unfortunately.
> >
> > The complete source (all 31 lines) is at
> bitbucket.org/joebowbeer/anagrams/src/
> >
> > In my trials, I'm using the long list of single words from
> http://icon.shef.ac.uk/Moby/mwords.html
> >
> > Joe
> >
> > On Wed, Jan 9, 2013 at 1:15 AM, Paul Sandoz <paul.sandoz at oracle.com>
> wrote:
> >
> > On Jan 9, 2013, at 7:47 AM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
> >
> > > In the javadoc for Map.compute in b72, there is an error in the sample
> > > snippet:
> > >
> > > http://download.java.net/lambda/b72/docs/api/java/util/Map.html
> > >
> > > Attempts to compute a mapping for the specified key and its current
> mapped
> > >> value (or null if there is no current mapping). For example, to either
> > >> create or append a String msg to a value mapping:
> > >> map.compute(key, (k, v, msg) -> (v == null) ? msg : v.concat(msg))
> > >
> > >
> > > Error: BiFunction does not accept three arguments.  In particular, msg
> is
> > > extraneous.  It should be defined in the lexical scope?
> > >
> > >
> > > Btw, I pondered how to efficiently use compute() or merge() to
> simulate a
> > > multi-map and I eventually gave up.
> > >
> > > I eventually wrote the following, which accumulates lists of anagrams,
> > > given a list of words:
> > >
> > > r.lines().forEach(s -> map.computeIfAbsent(key(s), k -> new
> > > ArrayList<>()).add(s));
> > >
> >
> > Have you tried using the latest groupBy functionality? e.g.:
> >
> >   Map<String, Collection<String>> anagrams =
> r.lines().parallel().reduce(groupBy(s -> key(s)));
> >
> > It would be interesting to compare performance of reduce vs.
> reduceUnordered and the concurrent groupBy leveraging CHM.
> >
> >
> > >
> > > Where key() returns the canonical key for a given word, and r is a
> > > BufferedReader for the dictionary file.
> > >
> > > The following line prints the lists of anagrams:
> > >
> > > map.values().stream().filter(v -> v.size() > 1).forEach(v ->
> > > System.out.println(v));
> > >
> >
> > Little style tip:
> >
> >   forEach(System.out::println)
> >
> > Paul.
> >
>
>