[vectorIntrinsics] RFR: Different cleanups in backend C2 support
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Thu Apr 2 14:11:34 UTC 2020
> Agree on all your points. On item 6) below, 4 byte vector minmax path will only come from auto-vectorizer (when vector width is restricted to 4 bytes) and not from vector API.
Thanks for the clarification, Sandhya. I'll enable 4-byte case then.
Best regards,
Vladimir Ivanov
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Thursday, April 02, 2020 2:30 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; panama-dev <panama-dev at openjdk.java.net>
> Subject: Re: [vectorIntrinsics] RFR: Different cleanups in backend C2 support
>
> Hi Sandhya,
>
> Thanks for the review.
>
>> 1. In x86.ad, the following set of rules are coming from mainline so changes to them could be done on mainline:
>> replicate, vector add, vector sub, mul, cmov, div, sqrt, scalar shift, and, or, xor, abs, neg, fma, muladds2i, popcount rules
>> Of vector mul, only two rules (mulL_reg, vmul4L_reg_avx) are added in vectorIntrinsics/vector-unstable branch.
>
> It makes perfect sense to upstream the cleanup, but I don't see a pressing need to split it into 2 parts right now. The changes can live in panama until part of them are integrated upstream. Do you see any major problems with that scenario?
>
>> 2. The following method is defined in two places, in x86.ad and c2_MacroAssembler_x86.cpp:
>> 1202 inline Assembler::AvxVectorLen vector_length_encoding(int bytes)
>> {
>> 36 inline Assembler::AvxVectorLen
>> C2_MacroAssembler::vector_length_encoding(int vlen_in_bytes) {
>
> Yes, I haven't spent much time yet looking for a proper place to put it (c2_MacroAssembler.inline.hpp maybe?), so ended up with 2 copies.
>
>> 3. x86.ad: match_rule_supported, why remove the following for ExtractB and ExtractL?
>> 1403 #ifndef _LP64
>> 1404 return false;
>> 1405 #endif
>> 4. x86.ad: match_rule_supported, why remove the following for VectorTest?
>> 1449 case Op_VectorTest:
>> 1450 return false;
>> 1451 break;
>
> The checks are redundant because relevant AD instructions are absent
> x86-32 and Matcher::has_match_rule() handles such cases.
>
>> 5. x86.ad, the reference to bottom_type() in the following statement could be replaced by elem_bt:
>> 3495 __ evgather(this->bottom_type()->is_vect()->element_basic_type(), $dst$$XMMRegister, ktmp, $tmp$$Register, $idx$$XMMRegister, vlen_enc);
>
> Yes, good point.
>
>> 6. The instructs minmax_reg and vminmax_reg can support vector_length_in_bytes == 4.
>
> Good.
>
> I left the comment because the original check excluded "== 4" case:
>
> -instruct minmax_reg(vec dst, vec src) %{
> - predicate(UseSSE > 3 && UseAVX == 0 &&
> - (n->as_Vector()->length_in_bytes() == 8 ||
> - n->as_Vector()->length_in_bytes() == 16) &&
> -
> is_integral_type(n->bottom_type()->is_vect()->element_basic_type()) &&
> - n->bottom_type()->is_vect()->element_basic_type() != T_LONG);
>
> But it didn't cause any problems before. It means that either the tests aren't exercise that case or such cases aren't intrinsified.
>
> I'd like to keep the assert until we find out what's going on there.
>
>> 7. vshiftL_arith_var instruct should have in the predicate UseAVX==2. If UseAVX > 2 then the vtmp is not required and we can directly use the evpsravq instruction (vshiftL_arith_var_evex rule) even for <=4 length.
>
>> 8. vcastStoX:
>> 6364 vector_length(n->in(1)) <= 16 && // src
>> should be
>> 6364 vector_length(n->in(1)) == 16 && // src
>
> Good catch. Will fix both.
>
> Best regards,
> Vladimir Ivanov
>
>> -----Original Message-----
>> From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of
>> Vladimir Ivanov
>> Sent: Wednesday, April 01, 2020 10:13 AM
>> To: panama-dev <panama-dev at openjdk.java.net>
>> Subject: [vectorIntrinsics] RFR: Different cleanups in backend C2
>> support
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector/cleanup.02/webrev.0
>> 0/
>>
>> Some cleanups in backend C2 support for Vector API.
>>
>> Testing: jdk/incubator/vector tests in different configuations on a
>> Skylake machine (-XX:UseAVX=[0..3] + -XX:UseSSE=[2..4])
>>
>> Best regards,
>> Vladimir Ivanov
>>
More information about the panama-dev
mailing list