Detecting range check elimination with PrintAssembly
Manohar Jonnalagedda
manojo10386 at gmail.com
Mon Jan 16 01:39:49 PST 2012
Hi Kris, Vladimir,
thanks for both your responses.
Second, your two test methods are different so you can't directly compare
> them. method1() iterates over rows using middle loop index 'j' and
> method2() uses external loop index 'i'. Unless they are typos again.
>
You are right, these are indeed typos. As Kris suggested, I have the code
printed here: http://pastebin.com/xRFD1Nt1. The methods corresponding to
method1, and method2 are constructNearestClusterVector and
computeNewCentroids. Their PrintOptoAssembly outputs are respectively at
http://pastebin.com/1evN8b3K and http://pastebin.com/FxkVWTD5
Also, it seems I have not explained myself correctly. I am not trying to
compare the performance of method1 with respect to that of method2: method1
and method2 both run in the same program. What I am trying to compare is
their performance in two cases:
- when cols is a compile-time constant (much faster)
- when cols is a value determined at run-time
> If you are using jdk7 there are few flags you can use to print loop
> optimizations information. They need debug version of VM but it is not
> problem for you, I think, since you can use debug PrintOptoAssembly flag.
>
> -XX:+TraceLoopOpts prints all happened loop optimizations and loop tree
> after each round of loop opts,
> -XX:+TraceLoopPredicate prints RC information when it is moved from a loop,
> -XX:+TraceRangeLimitCheck prints additional information for RC elimination
> optimization.
>
Thanks for these, I will have a look at what they output.
Fourth, range check expression in your example is not what you think. RC
> expression should be next:
> (i*stride+offset) where 'i' is loop variable, 'stride' is constant and
> 'offset' is loop invariant.
>
> In your example 'offset' is (j * cols) since it is loop invariant, 'k' is
> loop variable and stride is '1' (one).
> In both your methods RC will be moved out of inner loop so the code for it
> will be the same. The only difference in these methods will be where and
> how (j * cols) and (i * cols) expressions are calculated.
>
I'd guess it's the difference in locality that made the difference in
>> performance in your two tests.
>>
>
Thanks for the explanation. I understand from the above that the assembly
output in both cases mentioned above may not be different, because the
expressions are similar. The difference in runtime (due to cols being a
compile-time constant) will be visible elsewhere. Is that right? If so,
where would I be able to detect this?
Cheers,
Manohar
> In your PrintOptoAssembly output snippet, the instruction at 0x13e is a
>> LoadRange, which loads the range from the header
>> of an array:
>>
>> (from x86_64.ad <http://x86_64.ad>)
>>
>> // Load Range
>> instruct loadRange(rRegI dst, memory mem)
>> %{
>> match(Set dst (LoadRange mem));
>>
>> ins_cost(125); // XXX
>> format %{ "movl $dst, $mem\t# range" %}
>> opcode(0x8B);
>> ins_encode(REX_reg_mem(dst, mem), OpcP, reg_mem(dst, mem));
>> ins_pipe(ialu_reg_mem);
>> %}
>>
>> That's not a range check just yet; the real check, if any, should come
>> after the null check, in the form of comparing
>> something else with RSI. But you didn't show what's after the null check,
>> how RSI is used, so it's hard to say what
>> you're seeing in your example.
>>
>> As for the two test examples, could you paste the entire source code,
>> with the PrintOptoAssembly output of method1() and
>> method2() ? The first example looks weird, maybe it's a typo but you're
>> using "j < cols" as the loop condition for the
>> inner loop.
>>
>>
>
>> - Kris
>>
>> On Mon, Jan 16, 2012 at 1:59 AM, Manohar Jonnalagedda <
>> manojo10386 at gmail.com <mailto:manojo10386 at gmail.com>> wrote:
>>
>> Hello,
>>
>> following this reference on Range Check Elimination done by the
>> Hotspot compiler [1], I was keen in knowing how I
>> can detect whether range checks are taking place in loops by
>> inspecting output using the PrintAssembly flag; with
>> the old PrintOptoAssembly flag, I have seen output such as the
>> following, which I assume to be range checks :
>>
>> B11: # B73 B12 <- B10 Freq: 1.21365
>> 139 movq RAX, [rsp + #24] # spill
>> 13e movl RSI, [RAX + #12 (8-bit)] # range
>> 141 NullCheck RAX
>>
>> What is the equivalent with the new PrintAssembly flag (using hsdis)?
>>
>> Moreover, as stated on the wiki page [1], loops are optimized if the
>> stride is a compile-time constant. I performed
>> a few tests on a kmeans program, with 3 nested loops, having the
>> following (high-level) structure:
>>
>> ===
>> void method1(){
>> //loop 1
>> for(int i = 0; i< rows1; i++){
>> //...
>> for(int j = 0; j< rows2; j++){
>> //...
>> for(int k = 0; j < cols; k++){ array[j * cols + k] = //...}
>> }
>> }
>> }
>>
>> void method2(){
>> //loop 2
>> for(int i =0; i < rows1; i++){
>> for(int j=0 ; i< rows2; j++){
>> for(int k=0 ; k< cols; k++){
>> array[i*cols+k] = //...
>> }
>> }
>> }
>> }
>>
>> void main(){
>>
>> do{
>> method1(); method2();
>> }while(!converged)
>>
>> }
>> ====
>>
>> In the first test, cols is an int whose value is determined at runtime
>> (by reading a file), in the second test, it
>> is given as a compile-time constant(3). In the second test, there is a
>> */significant*/ speed-up (around 40%).
>>
>> However, when studying the diff of the output of PrintOptoAssembly for
>> both method1 and method2, there is no
>> difference (apart from slight value changes in frequency). Would you
>> have any hints as to where I could look for
>> differences?
>>
>> Thanks a lot,
>> Manohar
>>
>> [1]
>> https://wikis.oracle.com/display/HotSpotInternals/RangeCheckElimination
>>
>>
>>
> As Kris pointed you need to fix your example:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20120116/65204a22/attachment.html
More information about the hotspot-dev
mailing list