Request for reviews (M): 7121648: Use 3-operands SIMD instructions on x86 with AVX

Mon Dec 19 14:02:02 PST 2011

Looks good to me.

tom

On Dec 15, 2011, at 5:22 PM, Vladimir Kozlov wrote:

> I did renaming and updated webrev.
> 
> http://cr.openjdk.java.net/~kvn/7121648/webrev
> 
> I also fixed match_into_reg() to fold load into arithmetic instruction in a loop. Load was not folded because its control (NULL check) is usually moved outside the loop and loop's head is Region. So I added check for control of load's memory (memory phi) which stays inside a loop.
> 
> Before:
> 090   B11: #	B11 B12 <- B10 B11 Loop: B11-B11 inner main of N69 Freq: 999991
> 090   	movsd   XMM0, [R8 + #16 + RCX << #3]	# double
> 097   	movsd   XMM1, [R9 + #16 + RCX << #3]	# double
> 09e   	vaddsd  XMM0, XMM1, XMM0
> 0a2   	movsd   [R11 + #16 + RCX << #3], XMM0	# double
> 
> After:
> 090   B11: #	B11 B12 <- B10 B11 Loop: B11-B11 inner main of N69 Freq: 999991
> 090   	movsd   XMM0, [R8 + #16 + RCX << #3]	# double
> 097   	vaddsd  XMM0, XMM0, [R9 + #16 + RCX << #3]
> 09e   	movsd   [R11 + #16 + RCX << #3], XMM0	# double
> 
> Vladimir
> 
> Tom Rodriguez wrote:
>> On Dec 15, 2011, at 11:42 AM, Vladimir Kozlov wrote:
>>> Thank you, Tom
>>> 
>>> Tom Rodriguez wrote:
>>>> On Dec 15, 2011, at 10:36 AM, Vladimir Kozlov wrote:
>>>>> http://cr.openjdk.java.net/~kvn/7121648/webrev
>>>>> 
>>>>> 7121648: Use 3-operands SIMD instructions on x86 with AVX
>>>> adlc.make:
>>>> Can you use $(Platform_arch) instead of $(ARCH)?  I know it's the same but names are more obviously parallel.
>>> Done.
>>> 
>>>>> VEX prefix converts legacy SSE instructions into 3 operands instructions. Use such instructions in C2 generated code for machines with AVX:
>>>>> 
>>>>> vaddsd   XMM2, XMM0, [RSI + #8 + RCX << #3]
>>>>> 
>>>>> I also did go ahead and created x86.ad file to collect common 32- and 64-bit mach instructions definitions.
>>>> Instead of duplicating regX and regXD why not just rename regX and refXD to regF and regD and come up with a new name for the FPU regs.  Maybe regFPRF and regFPRD? 
>>> I am fine with renaming FPU registers to regFPRF and regFPRD but it will be a lot places in x86_32.ad. If you fine with it I will do renaming.
>> I just think if we're going to start down this path I want to make it more consistent instead of papering over the difference.  So I'm ok with the changes since for the most part it will be pure text replacement.  You'll probably need to rename things like immXF and immF to be consistent.
>> tom
>>>> Also can you correct the formatting in the new file to have the %{ on the same line?  And remove the // XXX's after the ins_cost.
>>> Done.
>>> 
>>>> Otherwise it looks fine.
>>> Thanks,
>>> Vladimir
>>> 
>>>> tom
>>>>> There is slight improvement in performance on AVX machine (full result in bug report):
>>>>> 
>>>>> Benchmark        Samples       Mean     Stdev   %Diff    P   Significant
>>>>> scimark_small        20     1031.87      4.68    2.85 0.000          Yes
>>>>>  LU                 20     1966.82     21.36    6.18 0.000          Yes
>>>>>  FFT                20      658.28     10.33    4.91 0.000          Yes
>>>>>  Monte              20      545.31      2.93   -0.37 0.179            *
>>>>>  SOR                20      994.96      0.67   -0.00 0.910            *
>>>>>  Sparse             20      993.99      1.49   -0.02 0.629            *
>>>>> 
>>>>> Thanks,
>>>>> Vladimir