[15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes

Fri Dec 13 22:18:44 UTC 2019

On 12/13/19 2:27 AM, Vladimir Ivanov wrote:
> Thanks for the feedback, Vladimir.
> 
>> replicateB
>>
>> Can you fold it differently?
>>
>> ReplB_reg_leg
> 
> Are you talking about ReplB_reg?

Yes, I was talking about ReplB_reg. I thought we can combine all [8-64] length vectors but I missed that ReplB_reg_leg 
uses legVec and needs separate instructions :(

And I wanted to separate instruction which use avx 512 (evpbroadcastb) because it is difficult to see relation between 
predicate condition (length() == 64 && VM_Version::supports_avx512bw()) and check in code (vlen == 64 || 
VM_Version::supports_avx512vlbw()). First, || vs &&. Second, avx512bw vs avx512vlbw. May be better to have a separate 
instruction for this.

> 
>>    predicate(!VM_Version::supports_avx512vlbw());
> 
> 3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{
> 3152   predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw());
> 
> For ReplB_reg_leg the predicate can't be simplified: it is applicable only to 512bit vector when AVX512BW is absent. 
> Otherwise, legVec constraint will be unnecessarily applied to other configurations.

That is why you replaced !avx512vlbw with !avx512bw?
May be this section of code need comment which explains why one or an other is used.

> 
> 3119 instruct ReplB_reg(vec dst, rRegI src) %{
> 3120   predicate((n->as_Vector()->length() <= 32) ||
> 3121             (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw()));
> 
> For ReplB_reg there's a shorter version:
> 
> predicate(n->as_Vector()->length() <= 32 || VM_Version::supports_avx512bw());
> 
> 
> But do you find it easier to read? For example, when you are checking that all configurations are covered:
> 
> predicate(n->as_Vector()->length() <= 32 ||
>            VM_Version::supports_avx512bw());
> 
> instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>    predicate(n->as_Vector()->length() == 64 &&
>             !VM_Version::supports_avx512bw());
> 
> vs
> 
> instruct ReplB_reg_leg(legVec dst, rRegI src) %{
>    predicate(n->as_Vector()->length() == 64 && !VM_Version::supports_avx512bw());
> 
> instruct ReplB_reg(vec dst, rRegI src) %{
>    predicate((n->as_Vector()->length() <= 32) ||
>              (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw()));

I think next conditions in predicates require comment about using avx512vlbw and avx512bw.

!   predicate((n->as_Vector()->length() <= 32 && VM_Version::supports_avx512vlbw()) ||
!             (n->as_Vector()->length() == 64 && VM_Version::supports_avx512bw()));

> 
> 
>>    ins_encode %{
>>      uint vlen = vector_length(this);
>>      __ movdl($dst$$XMMRegister, $src$$Register);
>>      __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister);
>>      __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00);
>>      if (vlen > 8) {
>>        __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister);
>>        if (vlen > 16) {
>>          __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister);
>>          if (vlen > 32) {
>>            assert(vlen == 64, "sanity");
>>            __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister, $dst$$XMMRegister, 0x1);
> 
> Yes, it should work as well. Do you find it easier to read though?

Code is smaller.

> 
>> Similar ReplB_imm_leg for which I don't see new implementation.
> 
> Good catch. Added it back.
> 
> (FTR completeness for reg2reg variants (_reg*) is mandatory. But for _mem and _imm it is optional: if they some 
> configuration isn't covered, _reg is used. But it was in original code, so I added it back.)

I don't see code which was in Repl4B_imm() and Repl8B_imm() (only movdl and movq without vpbroadcastb).

> 
> Updated version:
>    http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00

Should be webrev.01

> 
> Another thing I noticed is that for ReplI/.../ReplD cases avx512vl checks are not necessary:
> 
> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/
> 
> The code assumes that evpbroadcastd/evpbroadcastq, vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need AVX512VL 
> for 512bit case, but Intel manual says AVX512F is enough.
> 
> I plan to handle it as a separate change, but let me know if you want to incorporate it into 8235825.

Yes, lets do it separately.

> 
>> It should also simplify code for avx512 which one or 2 instructions.
> 
> Can you elaborate, please? Are you talking about the case when version for different vector sizes differ in 1-2 
> instructions (like ReplB_reg)?
No, I was talking about cases when evpbroadcastb and vpbroadcastb instructions are used. I was think to have them in 
separate instructions. In your latest version it would be only evpbroadcastb case from ReplB_reg().

Thanks,
Vladimir

> 
>> Other types changes can be done same way.
> 
> Best regards,
> Vladimir Ivanov
> 
>> On 12/12/19 3:19 AM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/
>>> https://bugs.openjdk.java.net/browse/JDK-8235825
>>>
>>> Merge AD instructions for the following vector nodes:
>>>    - ReplicateB, ..., ReplicateD
>>>
>>> Individual patches:
>>>
>>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual
>>>
>>> Testing: tier1-4, test run on different CPU flavors (KNL, CLX)
>>>
>>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>>> Reviewed-by: vlivanov, sviswanathan, ?
>>>
>>> Best regards,
>>> Vladimir Ivanov