RFR: 8262355: Support for AVX-512 opmask register allocation.

Jatin Bhateja jbhateja at openjdk.java.net
Wed Mar 3 10:41:58 UTC 2021


On Wed, 3 Mar 2021 01:52:33 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Hi,
>> 
>> @XiaohongGong also posted Arm SVE predicate register allocation support in panama-vector together with other commits about vector masking support: https://github.com/openjdk/panama-vector/pull/40 last week before this PR. The predicate register allocation part has been tested for some time internally and could be separated from that PR (https://github.com/openjdk/panama-vector/pull/40/commits/e658f4d189c21dcd3668fcafee25d9b678cd3640). If it helps, we can also propose a patch here in openjdk/jdk.
>> 
>>> I'd like to focus high-level aspects first.
>>> 
>>> There's a significant amount of crux coming from the fact that masks
>>> don't have their own type. Reusing TypeLong::LONG for k-registers may
>>> look appealing at first, but then all the places where RegVMask matters
>>> have to handle the types specially. Why not introduce a dedicated type
>>> for masks?
>>> 
>> 
>> I agree that a dedicate type sounds more reasonable, which is covered by @XiaohongGong 's patch, see: https://github.com/openjdk/panama-vector/pull/40/commits/3f69d40f08868062e2cc144b3b757dcbaa2db2d1
>> 
>>> Also, my understanding is AArch64/SVE allows predicate registers to be
>>> larger than 64-bit, so TypeLong::LONG won't work there and a dedicated
>>> representation will be needed.
>>> 
>> 
>> Yes, AArch64/SVE predicate registers could be larger. I see in Jatin's patch, it has arch dependent type Matcher::predicate_reg_type(), that looks hacky and workable. But I would still prefer a dedicate type, which looks cleaner. Would a dedicate type also work for k-register?
>> 
>> Thanks,
>> Ningsheng
>
>> Second question is about x86 and different mask representations it has:
>> AVX-512 introduces predicate registers, but AVX/AVX2 keep masks in wide
>> vector registers. Are we fine with leaking this difference into Ideal IR
>> and specifying what shape a mask value has during Ideal construction?
> 
> Yes, as @nsjian mentioned above, we added a new mask type mapped to a predicate register. Besides, to make a difference with the old vector IRs that uses vector registers for mask on other platforms, we also added a new abstract IR (`VectorMaskNode`) to represent the mask on SVE. All the mask generation IRs are extended from it. Please see codes: https://github.com/openjdk/panama-vector/pull/40/commits/3f69d40f08868062e2cc144b3b757dcbaa2db2d1 .

> _Mailing list message from [Vladimir Ivanov](mailto:vladimir.x.ivanov at oracle.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_
> 
> Good work, Jatin!
> 
> I'd like to focus high-level aspects first.
> 
> There's a significant amount of crux coming from the fact that masks
> don't have their own type. Reusing TypeLong::LONG for k-registers may
> look appealing at first, but then all the places where RegVMask matters
> have to handle the types specially. Why not introduce a dedicated type
> for masks?
> 
> Also, my understanding is AArch64/SVE allows predicate registers to be
> larger than 64-bit, so TypeLong::LONG won't work there and a dedicated
> representation will be needed.

 Current register allocation framework can perform allocation at the granularity of 32 bit. Thus in order to allocate a 64 bit register we  reserve 2 bits from the register mask of its corresponding live range. Spilling code (MachSpillCopyNode::implementation) also is sensitive to this since a 32 bit def in 64 bit mode spill only 32 bit value.

 Opmask register is special in a way such that usable register portion could be 8,16,32 or 64 bit wide depending on the lane type and  vector size. Thus in an optimal implementation both allocator and spill code may allocate and spill only the usable portion of the opmask register. This may not be possible in current allocation frame work.

 Keeping this added complexity out of implementation, existing patch performs both allocation and spilling at 64 bit granularity.  This is why a LONG type is sufficient to represent an Opmask register for X86.

 I agree that ARM SVE may have to create a new mask Type since performing the spill and allocation at widest possible mask will be costly.  Thus Matcher::perdicate_reg_type() can be used to return the LONG type for X86 and new mask type for ARM SVE. This will prevent  any modification in target independent IR.

 Also for X86 a mask generating node may have different Ideal type and register for non-AVX512 targets.

 Please let me know if there is any disconnect in my understanding here.
> 
> Second question is about x86 and different mask representations it has:
> AVX-512 introduces predicate registers, but AVX/AVX2 keep masks in wide
> vector registers. Are we fine with leaking this difference into Ideal IR
> and specifying what shape a mask value has during Ideal construction?
> 

For AVX-512 targets any mask generating node will have LONG type and a vector type for non-AVX512 case. 
This will pass down from IDEAL IR to Machine IR and thus allocator will take allocation call based on the ideal register corresponding to the type.

Please elaborate what adverse implication do you see with this approach. 

> Regarding the patch itself, RegisterSaver support and related changes
> can be integrated separately.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 28.02.2021 21:40, Jatin Bhateja wrote:

-------------

PR: https://git.openjdk.java.net/jdk/pull/2768


More information about the hotspot-compiler-dev mailing list