RFR(S): 8136820 Generate better code for some Unsafe addressing	patterns
    Vladimir Kozlov 
    vladimir.kozlov at oracle.com
       
    Tue Sep 22 12:31:18 UTC 2015
    
    
  
So the main fix is Op_AddI --> Op_AddX change in loopopts.cpp. I did not 
noticed it in all spaces removal there.
And additional code in matcher when !off->is_Con(). What is 'off' in 
your case?
What is change in superword.cpp?
Thanks,
Vladimir
On 9/22/15 7:48 PM, Roland Westrelin wrote:
> Thanks for looking at this, Vladimir.
>
>> It would be nice if you describe your changes (RFE also has nothing). Especially changes in matcher.cpp. From what I can understand changes are good but please explain what you did.
>
> Sorry about that.
>
> The address of the unsafe access is: (array + ((((long) i) << 2) + base)) which my change reshapes into: (array + base) + (((long) i) << 2) so array+base can be computed once for all because both array and base are loop invariant. The ad file change is so the entire new_base + (((long) i) << 2) address computation is folded into an x86 addressing mode (including the int to long cast). The matcher change ensure that if the same address computation is used by several memory accesses, the address is not computed once in a register and then shared but rather embedded in each memory access instruction as we do for other memory addressing modes.
>
> Roland.
>
>>
>> Thanks,
>> Vladimir
>>
>> On 9/21/15 6:00 PM, Roland Westrelin wrote:
>>> http://cr.openjdk.java.net/~roland/8136820/webrev.00/
>>>
>>> The main loop in:
>>>
>>>      static int[] array;
>>>      static long base;
>>>      static int test1() {
>>>          int res = 0;
>>>          for (int i = 0; i < 100; i++) {
>>>              long address = (((long) i) << 2) + base;
>>>              res += UNSAFE.getInt(array, address);
>>>          }
>>>          return res;
>>>      }
>>>
>>> is compiled as:
>>>
>>> 0b0 B2: # B2 B3 <- B1 B2 Loop: B2-B2 inner Freq: 101.492
>>> 0b0 movslq R9, R10 # i2l
>>> 0b3 salq R9, #2
>>> 0b7 addq R9, [R11 + #112 (8-bit)] # long
>>> 0bb addl RAX, [R8 + R9] # int
>>> 0bf incl R10 # int
>>> 0c2 cmpl R10, #100
>>> 0c6 jl,s B2 # loop end P=0.990147 C=6732.000000
>>>
>>> but could be compiled as:
>>>
>>> 0b2 B2: # B2 B3 <- B1 B2 Loop: B2-B2 inner Freq: 101.492
>>> 0b2 addl RAX, [R8 + pos R10 << #2] # int
>>> 0b6 incl R10 # int
>>> 0b9 cmpl R10, #100
>>> 0bd jl,s B2 # loop end P=0.990147 C=6732.000000
>>>
>>> base and array are loop invariant so array + base can be computed before the loop.
>>>
>>> Roland.
>>>
>
    
    
More information about the hotspot-compiler-dev
mailing list