Parallel decompositions, C/P/N/Q experiment, take 3

Fri Oct 5 05:37:08 PDT 2012

Hi,

The results for decomposition benchmark on 2x8x2 Xeon E5-2680
(SandyBridge) running Solaris 11, and 20120925 lambda nightly with -d64
-XX:-TieredCompilation -XX:+UseParallelOldGC -XX:+UseNUMA
-XX:-UseBiasedLocking -XX:+UseCondCardMark, is here [1].

This time it was generally focused on C=1, P=32 plane. It might be more
convenient to look for the high-level charts [2]. Doug, we can infer the
break-even front (aka par/seq = 1 isoline) from there. Raw data is here [3].

Some observations:
 - break-even front seem to fit N*Q = 2*10^5 in high-Q/low-N part, and
5*10^5 for low-Q/high-N case; that means for very light operation of
just a few arithmetic operations, we need to have at least 10^5 elements
in stream to justify going for parallel version.
 - usr% is predictably low for N < P
 - usr% is lower for lower Q, given the same N; this might highlight the
problem with parallel decomposition, and may explain high break-even
constant.
 - lower usr% is accompanied with larger sys% and ctxsw ratio

-Aleksey.

[1] http://shipilev.net/pub/jdk/lambda/bulk-fuzzy-20120925-snb-sol11-c1-p32/
[2]
http://shipilev.net/pub/jdk/lambda/bulk-fuzzy-20120925-snb-sol11-c1-p32/plane.pdf
[2]
http://shipilev.net/pub/jdk/lambda/bulk-fuzzy-20120925-snb-sol11-c1-p32/parseq.pdf