Detecting range check elimination with PrintAssembly

Manohar Jonnalagedda manojo10386 at gmail.com
Mon Jan 16 01:39:49 PST 2012


Hi Kris, Vladimir,

thanks for both your responses.

Second, your two test methods are different so you can't directly compare
> them. method1() iterates over rows using middle loop index 'j' and
> method2() uses external loop index 'i'. Unless they are typos again.
>

You are right, these are indeed typos. As Kris suggested, I have the code
printed here: http://pastebin.com/xRFD1Nt1. The methods corresponding to
method1, and method2 are constructNearestClusterVector and
computeNewCentroids. Their PrintOptoAssembly outputs are respectively at
http://pastebin.com/1evN8b3K and http://pastebin.com/FxkVWTD5

Also, it seems I have not explained myself correctly. I am not trying to
compare the performance of method1 with respect to that of method2: method1
and method2 both run in the same program. What I am trying to compare is
their performance in two cases:
- when cols is a compile-time constant (much faster)
- when cols is a value determined at run-time



> If you are using jdk7 there are few flags you can use to print loop
> optimizations information. They need debug version of VM but it is not
> problem for you, I think, since you can use debug PrintOptoAssembly flag.
>
> -XX:+TraceLoopOpts prints all happened loop optimizations and loop tree
> after each round of loop opts,
> -XX:+TraceLoopPredicate prints RC information when it is moved from a loop,
> -XX:+TraceRangeLimitCheck prints additional information for RC elimination
> optimization.
>

Thanks for these, I will have a look at what they output.

Fourth, range check expression in your example is not what you think. RC
> expression should be next:
> (i*stride+offset) where 'i' is loop variable, 'stride' is constant and
> 'offset' is loop invariant.
>
> In your example 'offset' is (j * cols) since it is loop invariant, 'k' is
> loop variable and stride is '1' (one).
> In both your methods RC will be moved out of inner loop so the code for it
> will be the same. The only difference in these methods will be where and
> how (j * cols) and (i * cols) expressions are calculated.
>

I'd guess it's the difference in locality that made the difference in
>> performance in your two tests.
>>
>
 Thanks for the explanation. I understand from the above that the assembly
output in both cases mentioned above may not be different, because the
expressions are similar. The difference in runtime (due to cols being a
compile-time constant) will be visible elsewhere. Is that right? If so,
where would I be able to detect this?

Cheers,
Manohar


> In your PrintOptoAssembly output snippet, the instruction at 0x13e is a
>> LoadRange, which loads the range from the header
>> of an array:
>>
>> (from x86_64.ad <http://x86_64.ad>)
>>
>> // Load Range
>> instruct loadRange(rRegI dst, memory mem)
>> %{
>>   match(Set dst (LoadRange mem));
>>
>>   ins_cost(125); // XXX
>>   format %{ "movl    $dst, $mem\t# range" %}
>>   opcode(0x8B);
>>   ins_encode(REX_reg_mem(dst, mem), OpcP, reg_mem(dst, mem));
>>   ins_pipe(ialu_reg_mem);
>> %}
>>
>> That's not a range check just yet; the real check, if any, should come
>> after the null check, in the form of comparing
>> something else with RSI. But you didn't show what's after the null check,
>> how RSI is used, so it's hard to say what
>> you're seeing in your example.
>>
>> As for the two test examples, could you paste the entire source code,
>> with the PrintOptoAssembly output of method1() and
>> method2() ? The first example looks weird, maybe it's a typo but you're
>> using "j < cols" as the loop condition for the
>> inner loop.
>>
>>
>
>> - Kris
>>
>> On Mon, Jan 16, 2012 at 1:59 AM, Manohar Jonnalagedda <
>> manojo10386 at gmail.com <mailto:manojo10386 at gmail.com>> wrote:
>>
>>    Hello,
>>
>>    following this reference on Range Check Elimination done by the
>> Hotspot compiler [1], I was keen in knowing how I
>>    can detect whether range checks are taking place in loops by
>> inspecting output using the PrintAssembly flag; with
>>    the old PrintOptoAssembly flag, I have seen output such as the
>> following, which I assume to be range checks :
>>
>>    B11: #  B73 B12 &lt;- B10  Freq: 1.21365
>>    139     movq    RAX, [rsp + #24]  # spill
>>    13e     movl    RSI, [RAX + #12 (8-bit)]  # range
>>    141     NullCheck RAX
>>
>>    What is the equivalent with the new PrintAssembly flag (using hsdis)?
>>
>>    Moreover, as stated on the wiki page [1], loops are optimized if the
>> stride is a compile-time constant. I performed
>>    a few tests on a kmeans program, with 3 nested loops, having the
>> following (high-level) structure:
>>
>>    ===
>>    void method1(){
>>    //loop 1
>>    for(int i = 0; i< rows1; i++){
>>      //...
>>       for(int j = 0; j< rows2; j++){
>>       //...
>>        for(int k = 0; j < cols; k++){ array[j * cols + k] = //...}
>>       }
>>    }
>>    }
>>
>>    void method2(){
>>    //loop 2
>>    for(int i =0; i < rows1; i++){
>>       for(int j=0 ; i< rows2; j++){
>>         for(int k=0 ; k< cols; k++){
>>           array[i*cols+k] = //...
>>         }
>>       }
>>    }
>>    }
>>
>>    void main(){
>>
>>      do{
>>        method1(); method2();
>>      }while(!converged)
>>
>>    }
>>    ====
>>
>>    In the first test, cols is an int whose value is determined at runtime
>> (by reading a file), in the second test, it
>>    is given as a compile-time constant(3). In the second test, there is a
>> */significant*/ speed-up (around 40%).
>>
>>    However, when studying the diff of the output of PrintOptoAssembly for
>> both method1 and method2, there is no
>>    difference (apart from slight value changes in frequency). Would you
>> have any hints as to where I could look for
>>    differences?
>>
>>    Thanks a lot,
>>    Manohar
>>
>>    [1]
>> https://wikis.oracle.com/display/HotSpotInternals/RangeCheckElimination
>>
>>
>>
> As Kris pointed you need to fix your example:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20120116/65204a22/attachment.html 


More information about the hotspot-dev mailing list