RFR(S): 8175096: Use Subword Analysis for set vector size
Deshpande, Vivek R
vivek.r.deshpande at intel.com
Tue Jul 18 18:50:36 UTC 2017
Hi Vladimir
I generated that code on my desktop.
The generated code on the skylake server looks like this,
and I have updated the bug with full main loop generated on skylake server and tests:
1d0 B31: # B31 B32 <- B30 B31 Loop: B31-B31 inner main of N161 Freq: 1024.97
1d0 movslq R11, RBP # i2l
1d3 vmovdqul XMM1 k0,[RBX + #16 + R11 << #2] ! load vector (64 bytes)
1de vpaddd XMM1,XMM1,[R9 + #16 + R11 << #2] ! add packed16I
1e9 movdq R10, XMM0 # spill
1ee vmovdqul XMM2 k0,[R10 + #16 + R11] ! load vector (64 bytes)
1f9 vpaddb XMM2,XMM2,[R13 + #16 + R11] ! add packed64B
204 vmovdqul [R14 + #16 + R11] k0,XMM2 ! store vector (64 bytes)
20f vmovdqul XMM2 k0,[RDX + #80 + R11 << #1] ! load vector (64 bytes)
21a vmovdqul XMM3 k0,[R10 + #80 + R11] ! load vector (64 bytes)
225 vpaddb XMM3,XMM3,[R13 + #80 + R11] ! add packed64B
230 vmovdqul [R14 + #80 + R11] k0,XMM3 ! store vector (64 bytes)
23b vmovdqul XMM3 k0,[RAX + #80 + R11 << #1] ! load vector (64 bytes)
246 vmovdqul XMM4 k0,[R10 + #144 + R11] ! load vector (64 bytes)
251 vpaddb XMM4,XMM4,[R13 + #144 + R11] ! add packed64B
25c vmovdqul [R14 + #144 + R11] k0,XMM4 ! store vector (64 bytes)
267 vmovdqul XMM4 k0,[RBX + #80 + R11 << #2] ! load vector (64 bytes)
272 vmovdqul XMM5 k0,[R10 + #208 + R11] ! load vector (64 bytes)
27d vpaddb XMM5,XMM5,[R13 + #208 + R11] ! add packed64B
288 vmovdqul [R14 + #208 + R11] k0,XMM5 ! store vector (64 bytes)
293 vmovdqul XMM5 k0,[RDX + #16 + R11 << #1] ! load vector (64 bytes)
29e vpaddw XMM5,XMM5,[RAX + #16 + R11 << #1] ! add packed32S
Please let me know your thoughts.
Regards,
Vivek
-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Monday, July 17, 2017 12:44 PM
To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net
Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
Now I am confused. Did you changed machine?
Originally vector size in code was 512 for integer arrays:
20c vmovdqul XMM0 k0,[RCX + #16 + R10 << #2] ! load vector (64 bytes)
217 vpaddd XMM0,XMM0,[RBX + #16 + R10 << #2] ! add packed16I
222 vmovdqul [RDX + #16 + R10 << #2] k0,XMM0 ! store vector (64 bytes)
Latest code show that it is limited to 256:
143 vmovdqu XMM0,[R9 + #16 + R10 << #2] ! load vector (32 bytes) 14a vpaddd XMM0,XMM0,[R11 + #16 + R10 << #2] ! add packed8I
Why vectors are smaller?
And suggestion. Please, modify the test in the bug report to have separate test methods which only operates on byte[] and short[] arrays and their combination. It is to see the the change fixed limiting vector size for sub-integer types.
Thanks,
Vladimir
On 7/13/17 11:49 AM, Deshpande, Vivek R wrote:
> Hi Vladimir
>
> I have updated the patch.
> With this update the generated code for the main loop is unrolled and uses full vector width.
> Updated Webrev:
> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.01/
> Would you please take a look and give your feedback/ suggestions.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: hotspot-compiler-dev
> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of
> Deshpande, Vivek R
> Sent: Friday, May 19, 2017 1:28 PM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Berg, Michael C
> <michael.c.berg at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: RE: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Hi Vladimir
>
> Thanks for taking a look at the patch.
> I will take a look at your example and work on improving the patch.
> Also work on the suggestion of different vector elements counts for different types.
>
> Regards,
> Vivek
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, May 19, 2017 12:17 PM
> To: Deshpande, Vivek R; Berg, Michael C;
> hotspot-compiler-dev at openjdk.java.net
> Cc: Viswanathan, Sandhya
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.
>
> Regards,
> Vladimir
>
> On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:
>> Hi Vladimir
>>
>> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>>
>> Regards,
>> Vivek
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Friday, February 24, 2017 10:07 AM
>> To: Berg, Michael C; Deshpande, Vivek R;
>> hotspot-compiler-dev at openjdk.java.net
>> Cc: Viswanathan, Sandhya
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>> size
>>
>> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to. This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface. These subword types are only in the integer subtype domain.
>>
>> "optimal" has wide meaning :)
>>
>> Currently it means small number of unrolls in presence of wide (long,
>> double) values.
>>
>> And if it is optimal why we need this additional fix (8175096)?
>>
>> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Regards,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, February 22, 2017 12:09 PM
>>> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Berg,
>>> Michael C <michael.c.berg at intel.com>
>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> Hi Vivek,
>>>
>>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>>
>>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>>
>>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>>
>>> if (cur_max_vector < max_vector) {
>>> max_vector = cur_max_vector;
>>>
>>> And we start with smallest type:
>>>
>>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>>
>>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>>
>>> if (cur_max_vector > max_vector) {
>>> max_vector = cur_max_vector;
>>>
>>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>>
>>> Michael, you are author of this code. What do you think?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>>> Hi
>>>>
>>>>
>>>>
>>>> Currently subword types cannot use entire vector width using SLP.
>>>>
>>>> This fix analyzes the subword in the loop for possibility of
>>>> narrowing and sets the vector size accordingly.
>>>>
>>>>
>>>>
>>>> Webrev:
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>>
>>>> I have also updated the JBS entry.
>>>>
>>>> c
>>>>
>>>> Would you please review and sponsor it.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Vivek
>>>>
>>>>
>>>>
More information about the hotspot-compiler-dev
mailing list