[14] RFR (XS): 8234392: C2: Extend Matcher::match_rule_supported_vector() with element type information

Tue Dec 10 20:03:03 UTC 2019

>> Similar checks for byte operations are redundant, because 
>> Matcher::vector_width_in_bytes() explicitly checks for AVX512BW:
>>
>>  if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
>>    size = (VM_Version::supports_avx512bw()) ? 64 : 32;
> 
> It’s true that 512bw and 512dq are different branches of 512, so it
> is permissible to detect them in different places.  But… it would
> seem easier to follow to check both in the one place:
> 
>     if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
>       size = (VM_Version::supports_avx512bw()) ? 64 : 32;
> +  if (UseAVX > 2 && (bt == T_FLOAT || bt == T_DOUBLE))
> +    size = (VM_Version::supports_avx512dq()) ? 64 : 32;
> 
> This is probably an oversimplification, so file it as BS (brain storming).
> If such a consolidation of sizing logic is possible, it can be done as
> a separate cleanup.

Unfortunately, single- and double-precision FP operations are scattered 
between AVX512F and AVX512DQ, so I doubt it makes sense to limit FP 
operations to 256-bit vectors when AVX512DQ is not available.

It would severely penalize Xeon Phis (which lack BW, DQ, and VL 
extensions), but maybe there'll be a moment when Skylake (F+CD+BW+DQ+VL) 
can be chosen as the baseline.

Anyway, I got your idea and it makes perfect sense to me to collect such 
ideas.

> (Side observation:  Since Abs and Neg are treated together in the vabsnegf
> macro-instruction, it seems odd that there would be special processing
> of NegVF and not AbsVF also.  More cleanups to do?)

Yes, it's a first round and there are other opportunities left. More 
will come after the patches are upstreamed.

>> I can add a comment that AVX512DQ is required, but it doesn't look 
>> like an improvement since the checks say exactly the same.
> 
> Maybe add a comment cross-referencing the two hunks of logic,
> since they are not parallel in the sources?  (For whatever reasons;
> it doesn’t matter…)  It’s important to be able to find all of the
> vector-sizing logic so it can be analyzed together, even if it must
> be distributed into different locations.
> 
>> tricky reason they must be called out explicitly, it would be
>>> good to explain.  The CMoveV rules probably stem from limited
>>> support for CMove, but again a comment would be good.
>>
>> Yes, there are only vcmov8F_reg and vcmov4D_reg present, so the checks 
>> are there to avoid matching failures. I'll put a comment that it's an 
>> implementation limitation.
>>
> 
> Yes, comments.  The matcher logic is tricky, and the structure
> of AVX512 is tricky, and comments would help navigate it all.

Yes, the complexity (very) quickly adds up... (Or "multiplies up"?..)

Best regards,
Vladimir Ivanov