RFR (M) 8222074: Enhance auto vectorization for x86

Tue Apr 9 19:53:26 UTC 2019

I agree that the AD file combinatorics need to be tamed
more aggressively in this way.

I have to wonder if this could be done during incubation,
as a cleanup of tech. debt, so we can get experience with
the API at the same time.

— John

On Apr 9, 2019, at 12:33 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach.
> 
> Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register).
> 
> Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX).
> 
> We can use this approach for already existing instructions too to reduce code size generated from AD files.