RFR (M) 8222074: Enhance auto vectorization for x86
Viswanathan, Sandhya
sandhya.viswanathan at intel.com
Wed Apr 10 17:21:48 UTC 2019
Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used:
__ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len);
__ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, vector_len);
Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required.
Best Regards,
Sandhya
-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Wednesday, April 10, 2019 9:59 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; B. Blaser <bsrbnd at gmail.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86
On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote:
> Hi Bernard,
>
> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src.
Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead $tmp2 to avoid overwriting
$src2 before we get value from it if $dst = $src2.
On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result.
It is a little mess which may cause ineffective use of registers in compiled code.
Thanks,
Vladimir
>
> Best Regards,
> Sandhya
>
>
> -----Original Message-----
> From: B. Blaser [mailto:bsrbnd at gmail.com]
> Sent: Wednesday, April 10, 2019 4:10 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Cc: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86
>
> Hi Sandhya and Vladimir K.,
>
> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
>>
>> Hi Vladimir,
>>
>> Yes, I missed the question below:
>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there?
>>
>> No it is not intentional, we can use the dst register in those cases and reduced the tmps.
>
> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg():
>
> 7349 format %{"pmovsxbw $tmp,$src1\n\t"
> 7350 "pmovsxbw $tmp2,$src2\n\t"
>
> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think?
>
> Thanks,
> Bernard
>
More information about the hotspot-compiler-dev
mailing list