London Lambdas Hackday: Performance and Parallelism

Wed Jul 4 07:33:02 PDT 2012

On 07/04/12 09:57, Richard Warburton wrote:
>>> 2. What is the decomposition model of parallel()?
>>>
>>> This isn't obvious from the documentation.  Is there a cost model for
>>> splitting things up?  Can people easily figure out a computational
>>> cost budget between different components and work out at what point
>>> there's a benefit to parallelism?

The reason that there are different Collection/Map implementations
is that there are different cost/functionality tradeoffs, and no single
winner. Under parallelism, amenability to partitioning will be yet another
tradeoff for users to consider when choosing a concrete class.
Luckily, extended versions of the two most common choices
(array-based and hash-based) are also almost always the best choices
for parallel operation. (In fact, we've had internal discussions
about whether to directly implement ONLY these forms and translate
everything else in and out of them. We can do slightly better than this
though, so probably will, although also offering access
to underlying functionality for array/hash-based classes. See for
example announcements of preview releases of planned JDK8
ConcurrentHashMap 
http://cs.oswego.edu/pipermail/concurrency-interest/2012-July/009551.html)

>>
>> Each collection should document something about its decomposition behavior,
>> just as it does its iteration behavior.
>
> Excellent.

Although it is challenging to document these in a way that is
useful to people beyond saying to use (updated) array or hash based
collections/maps if you want predictably good parallel performance.

The only utility of parallel operations for most other classes will
be when the per-element operations are so vastly time consuming
to outweigh the high time/space costs of partitioning.

-Doug