run-to-run variance on C/P/N/Q experiments

Tue Oct 9 02:18:11 PDT 2012

Hi,

I'm following up on the decomposition experiments, and this time focus
on run to run variance for these. I've took one of the break-even points
of the previous experiment on the same machine [1], and executed it
multiple times.

For C=1, P=32, N=3000, Q=20 in parallel case, we run the tests in two modes:
  a. 10 iterations per JVM invocation, 1000 JVM runs [2]
  b. 100 iterations per JVM invocation, 10 JVM runs [3]

The bottom line for this experiment is that we experience a huge
run-to-run variance, that are be triaged to be JITting jitter:
  - scores drift from run to run, staying within the bounds in the run
  - -Xint mitigates the variance (with a huge penalty in scores)
  - -Xcomp -Xbatch mitigates the variance (but drops the scores)

That also means that our break-even experiments are somewhat 30-50% off
the true value. There is no reasonable way found to lower the run-to-run
variance without the performance penalty, so we only option left at this
point is run with multiple invocations.

The disassembly dumps caught for low-score and high-score are here [4].
The integer there is the throughput we have on that code. If someone
could make sense of those logs alone, you are welcome to do so. The
entry point for microbenchmark is "testParallel" method. The inline
trees are somewhat different, but not that different to readily explain
the performance difference.

-Aleksey.

[1]
http://mail.openjdk.java.net/pipermail/lambda-dev/2012-October/006088.html
[2] http://shipilev.net/pub/jdk/lambda/runtorun-variance/i10-f1000/
[3] http://shipilev.net/pub/jdk/lambda/runtorun-variance/i100-f10/
[4] http://shipilev.net/pub/jdk/lambda/runtorun-variance/i10-f1000/asms/