Anagrams Kata (Was: ConcurrentHashMap/ConcurrentMap/Map.compute)
Joe Bowbeer
joe.bowbeer at gmail.com
Thu Jan 10 16:46:36 PST 2013
Note that the toLowercase/sort work is contained in the groupBy
classifier. The classifier groups each word according to its characters.
Even in the ideal parallel execution, all of the sorts are performed on
small arrays of characters.
On Thu, Jan 10, 2013 at 4:29 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> The sorts are all serial. So what is happening there is that instead of
> sorting one big thing, you are sorting lots of small things.
>
> The break-even on parallel groupBy may be high because of merging
> overhead, especially if we've got our split thresholds tuned too low, in
> which case we create lots of little maps and spend a lot of time merging
> them.
>
>
>
>
> On 1/10/2013 7:24 PM, Joe Bowbeer wrote:
>
>> Thanks Paul.
>>
>> To investigate, I pre-read all the lines:
>>
>> List<String> lines = Files.readAllLines(path, Charset.defaultCharset());
>>
>> and then profiled the three variants below on a dual-core desktop:
>>
>> 1. map = lines.stream().accumulate(**Accumulators.<String,
>> String>groupBy(Anagrams::key))**;
>> 2. map = lines.parallelStream().**accumulate(Accumulators.<**String,
>> String>groupBy(Anagrams::key))**;
>> 3. map = lines.stream().parallel().**accumulate(Accumulators.<**String,
>> String>groupBy(Anagrams::key))**;
>>
>> Results:
>>
>> 1. stream
>>
>> accumulate = 260ms
>> sort = 80ms
>> toLowercase = 20ms
>>
>> 2. parallelStream
>>
>> accumulate = 1342ms
>> sort = 283ms
>> toLowercase = 108ms
>>
>> 3. stream().parallel()
>>
>> accumulate = 1186ms
>> sort = 629ms
>> toLowercase = 826ms
>>
>>
>> This is a slow-down (5x more work), though not as dramatic as what
>> happened in the original case:
>>
>> Originally, the expression was one of:
>>
>> 1. map = r.lines().accumulate(**Accumulators.<String,
>> String>groupBy(Anagrams::key))**;
>> 2. map = r.lines().parallel().**accumulate(Accumulators.<**String,
>> String>groupBy(Anagrams::key))**;
>>
>> Results:
>>
>> 1. lines()
>>
>> accumulate = 396ms
>> sort = 121ms
>> toLowercase = 76ms
>>
>> 2. lines().parallel()
>>
>> accumulate = 20010ms!
>> sort = 801ms
>> toLowercase = ~ms
>>
>>
>> --Joe
>>
>>
>> On Thu, Jan 10, 2013 at 12:55 PM, Paul Sandoz <paul.sandoz at oracle.com
>> <mailto:paul.sandoz at oracle.com**>> wrote:
>>
>>
>> On Jan 10, 2013, at 7:28 PM, Joe Bowbeer <joe.bowbeer at gmail.com
>> <mailto:joe.bowbeer at gmail.com>**> wrote:
>>
>> > Thanks for the suggestions Paul.
>> >
>> > After updating to jdk8lambda b72, the anagram collector is now
>> expressed as:
>> >
>> > map = r.lines().accumulate(**Accumulators.<String,
>> String>groupBy(Anagrams::key))**;
>> >
>> > Notes:
>> >
>> > 1. Adding .parallel() after lines() works, but in my tests on a
>> dual-core laptop, it increased the accumulate time by an order of
>> magnitude.
>> >
>>
>> I would need to look at this more closely tomorrow but i suspect it
>> is due to two things: 1) an terribly unbalanced tree is created
>> (using a spliterator from an iterator since there is no size
>> information); and 2) the map merging are both contributing to the
>> slow down.
>>
>> Need to try the concurrent groupBy. Things are moving so fast i am
>> forgetting if b72 has that, oh for the need of continuous builds!
>>
>> If there are 354,984 words then that will make a tree of at least
>> 354,984/1024 leaf nodes with the ~ the same depth using the current
>> spliterator from iterator technique.
>>
>> A simple verification is to load up the words into memory as a list
>> and use that list as the source of words.
>>
>> Paul.
>>
>> > 2. The Accumulators.<> type specifier is currently required,
>> unfortunately.
>> >
>> > The complete source (all 31 lines) is at
>> bitbucket.org/joebowbeer/**anagrams/src/<http://bitbucket.org/joebowbeer/anagrams/src/>
>> <http://bitbucket.org/**joebowbeer/anagrams/src/<http://bitbucket.org/joebowbeer/anagrams/src/>
>> >
>>
>> >
>> > In my trials, I'm using the long list of single words from
>> http://icon.shef.ac.uk/Moby/**mwords.html<http://icon.shef.ac.uk/Moby/mwords.html>
>> >
>> > Joe
>> >
>> > On Wed, Jan 9, 2013 at 1:15 AM, Paul Sandoz
>> <paul.sandoz at oracle.com <mailto:paul.sandoz at oracle.com**>> wrote:
>> >
>> > On Jan 9, 2013, at 7:47 AM, Joe Bowbeer <joe.bowbeer at gmail.com
>> <mailto:joe.bowbeer at gmail.com>**> wrote:
>> >
>> > > In the javadoc for Map.compute in b72, there is an error in the
>> sample
>> > > snippet:
>> > >
>> > > http://download.java.net/**lambda/b72/docs/api/java/util/**
>> Map.html<http://download.java.net/lambda/b72/docs/api/java/util/Map.html>
>> > >
>> > > Attempts to compute a mapping for the specified key and its
>> current mapped
>> > >> value (or null if there is no current mapping). For example,
>> to either
>> > >> create or append a String msg to a value mapping:
>> > >> map.compute(key, (k, v, msg) -> (v == null) ? msg :
>> v.concat(msg))
>> > >
>> > >
>> > > Error: BiFunction does not accept three arguments. In
>> particular, msg is
>> > > extraneous. It should be defined in the lexical scope?
>> > >
>> > >
>> > > Btw, I pondered how to efficiently use compute() or merge() to
>> simulate a
>> > > multi-map and I eventually gave up.
>> > >
>> > > I eventually wrote the following, which accumulates lists of
>> anagrams,
>> > > given a list of words:
>> > >
>> > > r.lines().forEach(s -> map.computeIfAbsent(key(s), k -> new
>> > > ArrayList<>()).add(s));
>> > >
>> >
>> > Have you tried using the latest groupBy functionality? e.g.:
>> >
>> > Map<String, Collection<String>> anagrams =
>> r.lines().parallel().reduce(**groupBy(s -> key(s)));
>> >
>> > It would be interesting to compare performance of reduce vs.
>> reduceUnordered and the concurrent groupBy leveraging CHM.
>> >
>> >
>> > >
>> > > Where key() returns the canonical key for a given word, and r is
>> a
>> > > BufferedReader for the dictionary file.
>> > >
>> > > The following line prints the lists of anagrams:
>> > >
>> > > map.values().stream().filter(v -> v.size() > 1).forEach(v ->
>> > > System.out.println(v));
>> > >
>> >
>> > Little style tip:
>> >
>> > forEach(System.out::println)
>> >
>> > Paul.
>> >
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130110/24c28273/attachment-0001.html
More information about the lambda-libs-spec-experts
mailing list