Vector API Get Implementation patch

Fri Jun 22 20:30:31 UTC 2018

> http://cr.openjdk.java.net/~vdeshpande/webrevGetImplF/webrev.01/
extract1d and extract2f should use vecD as well. 
Otherwise, looks good. No need to send new webrev.
Best regards,
Vladimir Ivanov
> 
> 1) added extract1d, extract2f, extract1l
> 2) modified the source to vecD from vecX for the following instruct that use 64-bit vector operands as source. 
>   +instruct extract2i(rRegI dst, vecD src, immI idx) %{
>  +instruct extract4s(rRegI dst, vecD src, immI idx) %{
>  +instruct extract8b(rRegI dst, vecD src, immI idx) %{
> also extract1l uses vecD as source  // instruct extract1l (rRegI dst, vecD src, immI idx) 
> 
> regards,
> Rahul
> 
> -----Original Message-----
> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On Behalf Of Kandu, Rahul
> Sent: Wednesday, June 20, 2018 6:08 PM
> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; panama-dev at openjdk.java.net
> Subject: RE: Vector API Get Implementation patch
> 
> Yes, it should be guarded by the "$dst$$reg != $src$$reg". 
> 
> 
> -----Original Message-----
> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] 
> Sent: Wednesday, June 20, 2018 5:38 PM
> To: Kandu, Rahul <rahul.kandu at intel.com>; panama-dev at openjdk.java.net
> Subject: Re: Vector API Get Implementation patch
> 
> 
>> I left those out initially because extract1D/1L is going to be scalar, I can include them.
>> I think extract1D may look something like this:
> 
> Double64Vector/Long64Vector are still there and even though they are 1-element vectors, the operations have to be intrinsified to get reliable vectorbox elimination.
> 
> Yeah, a plain move should be enough. Shouldn't it be also guarded by "$dst$$reg != $src$$reg"?
> 
> Another observation: shouldn't 64-bit vector operands be represented by VecD and not VecX?
> 
> The same applies to:
> 
> +instruct extract2i(rRegI dst, vecX src, immI idx) %{
> 
> +instruct extract4s(rRegI dst, vecX src, immI idx) %{
> 
> +instruct extract8b(rRegI dst, vecX src, immI idx) %{
> 
> Best regards,
> Vladimir Ivanov
> 
>> 
>> instruct extract1d(regD dst, vecX src, immI idx) %{
>>   predicate(UseAVX > 0 && n->as_Vector()->length() == 1 && n->bottom_type()->basic_type() == T_DOUBLE);
>>   match(Set dst (ExtractD src idx));
>>   ins_encode %{
>>     int vector_len = 0;
>>     int midx = 0x1 & $idx$$constant;
>>     __ vmovdqu($dst$$XMMRegister, $src$$XMMRegister);
>>   %}
>>   ins_pipe( pipe_slow );
>> %}
>> 
>> -----Original Message-----
>> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
>> Sent: Wednesday, June 20, 2018 3:38 PM
>> To: Kandu, Rahul <rahul.kandu at intel.com>; panama-dev at openjdk.java.net
>> Subject: Re: Vector API Get Implementation patch
>> 
>> 
>>> I'll add the extract2f. Thanks for the suggestion. Let me send another webrev shortly with the extract2f.
>> 
>> What about extract1d & extract1l? Is there any reason why they aren't needed?
>> 
>>> I'm not quite sure why missing intrinsic(s) are not caught by tests- even so earlier when ExtractI rules were missing. I can find out.
>> 
>> Yes, please.
>> 
>> Best regards,
>> Vladimir Ivanov
>> 
>>> -----Original Message-----
>>> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
>>> Sent: Wednesday, June 20, 2018 2:54 PM
>>> To: Kandu, Rahul <rahul.kandu at intel.com>; panama-dev at openjdk.java.net
>>> Subject: Re: Vector API Get Implementation patch
>>> 
>>> Rahul,
>>> 
>>>> Please find the updated patch for the Get Implementation 
>>>> http://cr.openjdk.java.net/~vdeshpande/webrevGetImplF620/webrev/
>>>> 
>>>> in this patch, I made the following modifications.
>>>> 
>>>> 1. fixed AVX-512 rules with predicate "UseAVX >2"
>>>> 2. added ExtractI rules
>>>> 3. fixed tabulation issues in the instructs in the x86.ad file.
>>>> 4. Includes all test types- all tests passed, including Int128/256/512 etc.
>>>> 5. removed extract4b rule
>>> 
>>> Thanks. I noticed that some 64-bit variants are missing as well while JDK implementation still calls the intrinsic [1]:
>>> 
>>>     extract1d, extract2f, extract1l
>>> 
>>> Do you have an idea why the tests don't catch that? I believe intinsics don't kick in, but I don't see why (just by staring at the code).
>>> 
>>> Otherwise, looks good.
>>> 
>>>> This patch covers for AVX rules.
>>>> 
>>>> If its fine, I can add the Non-AVX rules (UseAVX==0 && UseSSE>3) in a patch for all the types and submit it separately.
>>> 
>>> Sounds good.
>>> 
>>> Best regards,
>>> Vladimir Ivanov
>>> 
>>> [1]
>>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Double64Vector.java:
>>> 
>>>       public double get(int i) {
>>>           if (i < 0 || i >= LENGTH) {
>>>               throw new IllegalArgumentException("Index " + i + " must be zero or positive, and less than " + LENGTH);
>>>           }
>>>           long bits = (long) VectorIntrinsics.extract(
>>>                                   Double64Vector.class, double.class, LENGTH,
>>>                                   this, i,
>>>                                   (vec, ix) -> {
>>>                                       double[] vecarr = vec.getElements();
>>>                                       return (long)Double.doubleToLongBits(vecarr[ix]);
>>>                                   });
>>>           return Double.longBitsToDouble(bits);
>>>       }
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On 
>>>> Behalf Of Kandu, Rahul
>>>> Sent: Monday, June 18, 2018 8:58 AM
>>>> To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; 
>>>> panama-dev at openjdk.java.net
>>>> Subject: RE: Vector API Get Implementation patch
>>>> 
>>>> Vladimir,
>>>> 
>>>> Thanks for your input. I'll upload the updated patch shortly.
>>>> 
>>>> Rahul
>>>> 
>>>> -----Original Message-----
>>>> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
>>>> Sent: Thursday, June 14, 2018 9:44 AM
>>>> To: Kandu, Rahul <rahul.kandu at intel.com>; 
>>>> panama-dev at openjdk.java.net
>>>> Subject: Re: Vector API Get Implementation patch
>>>> 
>>>> Rahul,
>>>> 
>>>> Some comments/questions/ suggestions:
>>>> 
>>>>      (1) I don't see any non-AVX rules. Do you plan to add them later?
>>>> 
>>>> 
>>>>      (2) No matcher rules for ExtractI as well. Same question: left for later?
>>>> 
>>>> 
>>>>      (3) AVX-512 rules should be guarded by "UseAVX > 2" predicate:
>>>> 
>>>> +instruct extract32s(rRegI dst, vecZ src, vecZ tmp, immI idx) %{
>>>> +  predicate(UseAVX > 0 && n->as_Vector()->length() == 32 &&
>>>> n->bottom_type()->basic_type() == T_SHORT);
>>>> 
>>>> 
>>>>      (4) Is extract4b ever used? I don't see any 32-bit vectors present.
>>>> 
>>>> +instruct extract4b(rRegI dst, vecX src, immI idx) %{
>>>> +  predicate(UseAVX > 0 && n->as_Vector()->length() == 4 &&
>>>> n->bottom_type()->basic_type() == T_BYTE);
>>>> 
>>>> 
>>>>       (5) I suggest to migrate to immU? (recently introduced) and 
>>>> get rid of index masking (see [1] for examples):
>>>>       +    int midx = 0x3F & $idx$$constant;
>>>> 
>>>> Applying similar refactorings to rvinsert64B, the following:
>>>> 
>>>> +instruct extract64b(rRegI dst, vecZ src, vecZ tmp, immI idx) %{
>>>> ...
>>>> +    int midx = 0x3F & $idx$$constant;
>>>> +    if (midx == 0) {
>>>> +      __ movdl($dst$$Register, $src$$XMMRegister);
>>>> +          __ movsbl($dst$$Register, $dst$$Register);
>>>> +    }
>>>> +    else if (midx >= 1 && midx <= 15) {
>>>> +      __ pextrb($dst$$Register, $src$$XMMRegister, midx);
>>>> +          __ movsbl($dst$$Register, $dst$$Register);
>>>> +    }
>>>> +    else {
>>>> +      int extr_idx1 = midx / 16;
>>>> +      int extr_idx2 = midx % 16;
>>>> +      __ vextracti32x4($tmp$$XMMRegister, $src$$XMMRegister, extr_idx1);
>>>> +      __ pextrb($dst$$Register, $tmp$$XMMRegister, extr_idx2);
>>>> +          __ movsbl($dst$$Register, $dst$$Register);
>>>> +    }
>>>> 
>>>> can be converted to into:
>>>> 
>>>> +instruct extract64b(rRegI dst, vecZ src, vecZ tmp, immU6 idx) %{
>>>> ...
>>>> +    uint x_idx = $idx$$constant & right_n_bits(4);
>>>> +    uint z_idx = ($idx$$constant >> 4) & 2;
>>>> +    if (z_idx == 0) {
>>>> +      if (x_idx == 0) {
>>>> +        __ movdl($dst$$Register, $src$$XMMRegister);
>>>> +      } else {
>>>> +        __ pextrb($dst$$Register, $src$$XMMRegister, x_idx);
>>>> +      }
>>>> +    } else { // z_idx > 0
>>>> +      __ vextracti32x4($tmp$$XMMRegister, $src$$XMMRegister, z_idx);
>>>> +      __ pextrb($dst$$Register, $tmp$$XMMRegister, x_idx);
>>>> +    }
>>>> +    __ movsbl($dst$$Register, $dst$$Register);
>>>> 
>>>> or even:
>>>> 
>>>> +    uint x_idx = $idx$$constant & right_n_bits(4);
>>>> +    uint z_idx = ($idx$$constant >> 4) & 2;
>>>> +    Register r_i32 = $src$$XMMRegister;
>>>> +    if (z_idx > 0) {
>>>> +      r_i32 = $tmp$$XMMRegister;
>>>> +      __ vextracti32x4(r_i32, $src$$XMMRegister, z_idx);
>>>> +    }
>>>> +    if (x_idx == 0) {
>>>> +      __ movdl($dst$$Register, r_i32);
>>>> +    } else {
>>>> +      __ pextrb($dst$$Register, r_i32, x_idx);
>>>> +    }
>>>> +    __ movsbl($dst$$Register, $dst$$Register);
>>>> 
>>>> 
>>>> Also, please, fix code formatting. Many lines have wrong indentation.
>>>> 
>>>> Best regards,
>>>> Vladimir Ivanov
>>>> 
>>>> [1] http://hg.openjdk.java.net/panama/dev/rev/e00750ed585a
>>>> 
>>>>> On 14/06/2018 03:10, Kandu, Rahul wrote:
>>>>> Hi All,
>>>>> 
>>>>> Please find the Get Implementation patch for the Vector API. The following patch also includes Tests for Get operations.
>>>>> 
>>>>> http://cr.openjdk.java.net/~rlupusoru/panama/webrevrGetImplF/
>>>>> 
>>>>> regards,
>>>>> Rahul
>>>>>