RFR(S): 8175096: Use Subword Analysis for set vector size

Fri Feb 24 18:56:12 UTC 2017

Vladimir,

The cases for this are for short and byte types stmt expressions, where the memory operations are sized to the key types, but the arithmetic components are not, but instead int as in this case:

if (!in->is_Mem() && in_bb(in) && in->bottom_type()->basic_type() == T_INT)

for the inputs on short and byte typed operations, as the node n is already bound to is_subword_type(bt).

Here is a snippet of code that would show the issue:

    static final int NUM = 1024;
    static byte[] data =  new byte[NUM],
                  data2 = new byte[NUM],
                  data3 = new byte[NUM];
    static short[] data4 =  new short[NUM],
                  data5 = new short[NUM],
                  data6 = new short[NUM];
    static int[] data7 =  new int[NUM],
                  data8 = new int[NUM],
                  data9 = new int[NUM];

    public static double count(long X) {
        long time1, time0 = System.nanoTime();
        doit2(X);
        time1 = System.nanoTime();
        return 1f*X/(time1-time0)*1e9;
    }

    public static void doit2(long X) {
        while (X > 0) {
            for (int i = 0; i < NUM; i++) {
                data[i] = (byte)(data2[i] + data3[i]);
                data4[i] = (short)(data5[i] + data6[i]);
                data7[i] = data8[i] + data9[i];
            }
            X--;
        }
    }

Where main inits the arrays to some set values for use above.  You can even break this it into two cases for each of the byte and short stmt level expressions. We can assert that this is always safe as superword by design does not allow mixed type stmt level expressions where byte and short are intermingled at that level, just the artifacts of boxing the interior components, like arithmetic as int.

Regards,
Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Friday, February 24, 2017 10:07 AM
To: Berg, Michael C <michael.c.berg at intel.com>; Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net
Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size

On 2/23/17 9:03 AM, Berg, Michael C wrote:
> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.

"optimal" has wide meaning :)

Currently it means small number of unrolls in presence of wide (long,
double) values.

And if it is optimal why we need this additional fix (8175096)?

I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?

Thanks,
Vladimir

>
> Regards,
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, February 22, 2017 12:09 PM
> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>;
> hotspot-compiler-dev at openjdk.java.net
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Berg,
> Michael C <michael.c.berg at intel.com>
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Hi Vivek,
>
> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>
> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>
> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>
>          if (cur_max_vector < max_vector) {
>            max_vector = cur_max_vector;
>
> And we start with smallest type:
>
> int max_vector = Matcher::max_vector_size(T_BYTE);
>
> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>
>          if (cur_max_vector > max_vector) {
>            max_vector = cur_max_vector;
>
> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>
> Michael, you are author of this code. What do you think?
>
> Thanks,
> Vladimir
>
> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>> Hi
>>
>>
>>
>> Currently subword types cannot use entire vector width using SLP.
>>
>> This fix analyzes the subword in the loop for possibility of
>> narrowing and sets the vector size accordingly.
>>
>>
>>
>> Webrev:
>>
>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>
>> I have also updated the JBS entry.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8175096
>>
>> Would you please review and sponsor it.
>>
>>
>>
>> Regards,
>>
>> Vivek
>>
>>
>>