RFR: 8262355: Support for AVX-512 opmask register allocation.

Wed Mar 3 12:22:47 UTC 2021

>> There's a significant amount of crux coming from the fact that masks
>> don't have their own type. Reusing TypeLong::LONG for k-registers may
>> look appealing at first, but then all the places where RegVMask matters
>> have to handle the types specially. Why not introduce a dedicated type
>> for masks?
>>
>> Also, my understanding is AArch64/SVE allows predicate registers to be
>> larger than 64-bit, so TypeLong::LONG won't work there and a dedicated
>> representation will be needed.
> 
>   Current register allocation framework can perform allocation at the granularity of 32 bit. Thus in order to allocate a 64 bit register we  reserve 2 bits from the register mask of its corresponding live range. Spilling code (MachSpillCopyNode::implementation) also is sensitive to this since a 32 bit def in 64 bit mode spill only 32 bit value.
> 
>   Opmask register is special in a way such that usable register portion could be 8,16,32 or 64 bit wide depending on the lane type and  vector size. Thus in an optimal implementation both allocator and spill code may allocate and spill only the usable portion of the opmask register. This may not be possible in current allocation frame work.

Yes, it may be attractive in the future, but I don't see it as something 
important enough for the first versions.

>   Keeping this added complexity out of implementation, existing patch performs both allocation and spilling at 64 bit granularity.  This is why a LONG type is sufficient to represent an Opmask register for X86.

That's fine with me.

>   I agree that ARM SVE may have to create a new mask Type since performing the spill and allocation at widest possible mask will be costly.  Thus Matcher::perdicate_reg_type() can be used to return the LONG type for X86 and new mask type for ARM SVE. This will prevent  any modification in target independent IR.

Matcher::perdicate_reg_type() is not enough to hide the 
platform-specific choice. The choice of TypeLong introduces significant 
complexity in shared code (e.g, PhiNode and MachSpillCopyNode-related 
changes). Moreover, SVE will have to choose an alternative and more 
generic representation. Hence, it makes the ad-hoc changes needed for 
RegVMask+TypeLong even less attractive.

Considering there's active work going on on SVE support, I'm in favor of 
collaborating on unified representation between platforms and rely on it 
in the first version.

>   Also for X86 a mask generating node may have different Ideal type and register for non-AVX512 targets.

>> Second question is about x86 and different mask representations it has:
>> AVX-512 introduces predicate registers, but AVX/AVX2 keep masks in wide
>> vector registers. Are we fine with leaking this difference into Ideal IR
>> and specifying what shape a mask value has during Ideal construction?
>>
> 
> For AVX-512 targets any mask generating node will have LONG type and a vector type for non-AVX512 case.
> This will pass down from IDEAL IR to Machine IR and thus allocator will take allocation call based on the ideal register corresponding to the type.

What I'd like to avoid is the situation when different Ideal IR shapes 
are needed to work with different mask representations.

Customizing node types fits that goal well, but I don't fully understand 
all the implications yet, in particular:

   (1) having a node which can be of type TypeVect or TypeLong may be 
problematic for some existing code;

   (2) as of now, there's nothing which forbids pre-AVX512 and AVX512 
code to be mixed in a single compilation unit. Is it feasible to require 
only a single mask representation to be used at runtime?

Best regards,
Vladimir Ivanov