RFR(XS): JDK-8010941: MinJumpTableSize is set to 18, investigate if that's still optimal

Tue May 21 11:17:27 PDT 2013

On 5/21/13 7:12 AM, Aleksey Shipilev wrote:
> On 05/21/2013 05:58 PM, Niclas Adlertz wrote:
>>> At very least:
>>> * mean includes warmup time
>> As I see it, and please correct me if I'm wrong, warmup time should only be the first iteration of NUMBER_OF_TEST_EXECUTIONS (when doing 90000000 iterations (NUMBER_OF_ITERATIONS) and the on stack replacement triggers).
>> Ignoring the first test_time from the first iteration of NUMBER_OF_TEST_EXECUTIONS should be enough?
>
> You can't see beforehand how long is the warmup. Hence, it is
> insufficient to throw away first iteration. You should instead make a
> few iterations, identify the steady state (= where performance settled),
> and derive the metrics from there.

We usually do about 20000 iterations and run with -Xbatch to make sure 
tested method is compiled before time measurement.

>
>>> * store to $dead enables us to compute only the last iteration, and
>>> remove all others
>> What if I add to the dead value in each iteration of NUMBER_OF_ITERATIONS;
>
> Seems to be working to evade DCE. Another problem: different loop
> unrolling can skew the results, because the loop can be effectively
> pipelined (also, if we were not using doubles, we might have collapsed
> the calculations even more, but thanks to double non-associativity, it
> is not the issue here).
>
> Also, I begin to wonder if after the multiply_by_power_of_ten inlining
> start to affect how far we unroll the loop, since the jump tables are
> starting to be larger.

You can avoid it by 
-XX:CompileCommand=dontinline,Test::multiply_by_power_of_ten

You can't relay on results if method is inlined because of Aleksey's 
pointed problems.

An other problem I see is different multiply values may take different 
times (I am talking hw instruction execution, C2 does not optimize 
double multiply). Also you have additional conversion int->double and 
long->double code which will affect results. You need the same code 
size/latency for each branch to get correct result.

Saying all that, the result you get (5) is about right. But I would like 
to see data from your experiment. And may be we can use the same value 
on all platforms (don't forget embedded).

thanks,
Vladimir

>
>> Please see the updated version of Test.java
>> http://cr.openjdk.java.net/~adlertz/JDK-8010941/update/Test.java
>
> Again, these are only the surface issues, I recommend you to migrate to
> JMH and stop guessing where your benchmark can be wrong :) It is a good
> exercise anyway.
>
> -Aleksey.
>
>