Bulk decomposition: naive parallel vs. sequential test

Wed Sep 5 06:38:10 PDT 2012

Hi,

I had a few empty cycles to fill in for funny experiment, based on my
previous one [1]. tl;dr version is: hand-crafted generator for longs in
(0; N], simple filter (with variable cost Q) to empty
Fillable/Sink/Destination. The question to answer: how would performance
change with growing N while maintaining very low Q, in both sequential
and parallel modes?

On my small laptop, 2x2 i5-2520M 2.0 Ghz, Xubuntu 12.04, JDK8 x86 (built
from it2-bootstrap branch on 2012/09/05), and basic JDK options -Xmx2g
-Xms2g -XX:+ParallelOldGC -XX:-UseBiasedLocking, I've got a few curves [2].

The result is coherent with my understanding how our framework operates:
 - sequential grows execution time linearly with N
 - parallel has static penalty for low N
 - parallel is able to exploit parallelism on decent N-s
 - the infliction point seem to be closer to N=10^3
 - max(par/seq) > 2.5x, meaning 2.5x improvement on my 2-core HT host

Note that sequential execution time (almost) describes the amount of
actual work to be done, and so we can say from this data that parallel
version gets into equilibrium between overheads and increased
parallelism when we have the large task of ~5*10^4 nsec = 50us worth of
sequential work ahead of us. This is actually surprising result for me,
because I assumed the overheads for parallel decomposition are much
greater, and the infliction point is at much much larger N.

-Aleksey.

[1] http://mail.openjdk.java.net/pipermail/lambda-dev/2012-July/005333.html
[2] http://shipilev.net/pub/jdk/lambda/parseq-1/parseq.pdf