RFR (M) 8222074: Enhance auto vectorization for x86

Viswanathan, Sandhya sandhya.viswanathan at intel.com
Wed May 1 01:14:53 UTC 2019


Hi VladimirK,

JBS: https://bugs.openjdk.java.net/browse/JDK-8222074

Please find updated webrev at:
http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/

With this webrev the ad file has only about 60 lines effectively added. 
Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%.
I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules. 
What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself.
The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE. 
 
In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool.
Also taken into account reducing the temporary by using TEMP dst for multiply rules. 

The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL.

Your review and feedback is welcome.

Best Regards,
Sandhya


-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Viswanathan, Sandhya
Sent: Wednesday, April 10, 2019 10:22 AM
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; B. Blaser <bsrbnd at gmail.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86

Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used:

    __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len);
    __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, vector_len);

Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. 

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, April 10, 2019 9:59 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; B. Blaser <bsrbnd at gmail.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86

On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote:
> Hi Bernard,
> 
> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src.

Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead $tmp2 to avoid overwriting 
$src2 before we get value from it if $dst = $src2.

On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result.

It is a little mess which may cause ineffective use of registers in compiled code.

Thanks,
Vladimir

> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: B. Blaser [mailto:bsrbnd at gmail.com]
> Sent: Wednesday, April 10, 2019 4:10 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Cc: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86
> 
> Hi Sandhya and Vladimir K.,
> 
> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
>>
>> Hi Vladimir,
>>
>> Yes, I missed the question below:
>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there?
>>
>> No it is not intentional, we can use the dst register in those cases and reduced the tmps.
> 
> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg():
> 
> 7349   format %{"pmovsxbw  $tmp,$src1\n\t"
> 7350            "pmovsxbw  $tmp2,$src2\n\t"
> 
> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think?
> 
> Thanks,
> Bernard
> 


More information about the hotspot-compiler-dev mailing list