run-to-run variance on C/P/N/Q experiments
Aleksey Shipilev
aleksey.shipilev at oracle.com
Tue Oct 9 03:41:56 PDT 2012
Yes, turned off.
In fact, the configuration is the same as [1], namely:
decomposition benchmark on 2x8x2 Xeon E5-2680 (SandyBridge) running
Solaris 11, and 20120925 lambda nightly with -d64 -XX:-TieredCompilation
-XX:+UseParallelOldGC -XX:+UseNUMA -XX:-UseBiasedLocking
-XX:+UseCondCardMark.
And the difference is sustainable throughout the run (even though that
could be explained with C1 getting different profiles in tiered mode,
which is not enabled in this particular case).
-Aleksey.
On 10/09/2012 02:34 PM, Remi Forax wrote:
> Aleksey,
> is it with tiered compilation enable or not ?
>
> I've found that tiered compilation introduces more jitter than when the
> VM is configured to only c2.
>
> Rémi
>
> On 10/09/2012 11:18 AM, Aleksey Shipilev wrote:
>> Hi,
>>
>> I'm following up on the decomposition experiments, and this time focus
>> on run to run variance for these. I've took one of the break-even points
>> of the previous experiment on the same machine [1], and executed it
>> multiple times.
>>
>> For C=1, P=32, N=3000, Q=20 in parallel case, we run the tests in two modes:
>> a. 10 iterations per JVM invocation, 1000 JVM runs [2]
>> b. 100 iterations per JVM invocation, 10 JVM runs [3]
>>
>> The bottom line for this experiment is that we experience a huge
>> run-to-run variance, that are be triaged to be JITting jitter:
>> - scores drift from run to run, staying within the bounds in the run
>> - -Xint mitigates the variance (with a huge penalty in scores)
>> - -Xcomp -Xbatch mitigates the variance (but drops the scores)
>>
>> That also means that our break-even experiments are somewhat 30-50% off
>> the true value. There is no reasonable way found to lower the run-to-run
>> variance without the performance penalty, so we only option left at this
>> point is run with multiple invocations.
>>
>> The disassembly dumps caught for low-score and high-score are here [4].
>> The integer there is the throughput we have on that code. If someone
>> could make sense of those logs alone, you are welcome to do so. The
>> entry point for microbenchmark is "testParallel" method. The inline
>> trees are somewhat different, but not that different to readily explain
>> the performance difference.
>>
>> -Aleksey.
>>
>> [1]
>> http://mail.openjdk.java.net/pipermail/lambda-dev/2012-October/006088.html
>> [2] http://shipilev.net/pub/jdk/lambda/runtorun-variance/i10-f1000/
>> [3] http://shipilev.net/pub/jdk/lambda/runtorun-variance/i100-f10/
>> [4] http://shipilev.net/pub/jdk/lambda/runtorun-variance/i10-f1000/asms/
>>
>
>
More information about the lambda-dev
mailing list