Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)

Paul Sandoz paul.sandoz at oracle.com
Thu Jan 10 12:55:37 PST 2013


On Jan 10, 2013, at 7:28 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> Thanks for the suggestions Paul.
> 
> After updating to jdk8lambda b72, the anagram collector is now expressed as:
> 
>   map = r.lines().accumulate(Accumulators.<String, String>groupBy(Anagrams::key));
> 
> Notes:
> 
> 1. Adding .parallel() after lines() works, but in my tests on a dual-core laptop, it increased the accumulate time by an order of magnitude.
> 

I would need to look at this more closely tomorrow but i suspect it is due to two things: 1) an terribly unbalanced tree is created (using a spliterator from an iterator since there is no size information); and 2) the map merging are both contributing to the slow down. 

Need to try the concurrent groupBy. Things are moving so fast i am forgetting if b72 has that, oh for the need of continuous builds!

If there are 354,984 words then that will make a tree of at least 354,984/1024 leaf nodes with the ~ the same depth using the current spliterator from iterator technique.

A simple verification is to load up the words into memory as a list and use that list as the source of words.

Paul.

> 2. The Accumulators.<> type specifier is currently required, unfortunately.
> 
> The complete source (all 31 lines) is at bitbucket.org/joebowbeer/anagrams/src/
> 
> In my trials, I'm using the long list of single words from http://icon.shef.ac.uk/Moby/mwords.html
> 
> Joe
> 
> On Wed, Jan 9, 2013 at 1:15 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> 
> On Jan 9, 2013, at 7:47 AM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
> 
> > In the javadoc for Map.compute in b72, there is an error in the sample
> > snippet:
> >
> > http://download.java.net/lambda/b72/docs/api/java/util/Map.html
> >
> > Attempts to compute a mapping for the specified key and its current mapped
> >> value (or null if there is no current mapping). For example, to either
> >> create or append a String msg to a value mapping:
> >> map.compute(key, (k, v, msg) -> (v == null) ? msg : v.concat(msg))
> >
> >
> > Error: BiFunction does not accept three arguments.  In particular, msg is
> > extraneous.  It should be defined in the lexical scope?
> >
> >
> > Btw, I pondered how to efficiently use compute() or merge() to simulate a
> > multi-map and I eventually gave up.
> >
> > I eventually wrote the following, which accumulates lists of anagrams,
> > given a list of words:
> >
> > r.lines().forEach(s -> map.computeIfAbsent(key(s), k -> new
> > ArrayList<>()).add(s));
> >
> 
> Have you tried using the latest groupBy functionality? e.g.:
> 
>   Map<String, Collection<String>> anagrams = r.lines().parallel().reduce(groupBy(s -> key(s)));
> 
> It would be interesting to compare performance of reduce vs. reduceUnordered and the concurrent groupBy leveraging CHM.
> 
> 
> >
> > Where key() returns the canonical key for a given word, and r is a
> > BufferedReader for the dictionary file.
> >
> > The following line prints the lists of anagrams:
> >
> > map.values().stream().filter(v -> v.size() > 1).forEach(v ->
> > System.out.println(v));
> >
> 
> Little style tip:
> 
>   forEach(System.out::println)
> 
> Paul.
> 



More information about the lambda-libs-spec-experts mailing list