RFR: 8262355: Support for AVX-512 opmask register allocation.

Wed Mar 3 01:55:50 UTC 2021

On Wed, 3 Mar 2021 01:39:11 GMT, Ningsheng Jian <njian at openjdk.org> wrote:

>> AVX-512 added 8 new 64 bit opmask registers[1] . These registers allow conditional execution and efficient merging of destination operands. At present cross instruction mask propagation is being done either using a GPR (e.g. vmask_gen patterns in x86.ad) or a vector register (for propagating results of a vector comparison or vector load mask operations).
>> 
>> This base patch extends the register allocator to support allocation of opmask registers. This will facilitate mask propagation across instructions and thus enable emitting efficient instruction sequence over X86 targets supporting AVX-512 feature.
>> 
>> We intend to build a robust optimization framework[2] based on this patch to emit optimized instruction sequence for masked/predicated vector operation for X86 targets supporting AVX-512. 
>> 
>> Please review and share your feedback.
>> 
>> Summary of changes:
>> 
>> 1) AD side changes: New register definitions, register classes, allocation classes, operand definitions and spill code handling for opmask registers.
>> 
>> 2) Runtime: Save/restoration for opmask registers in 32 and 64 bit JVM.
>>    a) For 64 bit JVM we were anyways reserving the space in the frame layout but earlier were not saving and restoring at designated offset(1088), hence no extra space overhead apart from save/restore cost.
>>    b) For 32 bit JVM: Additional 64 byte are allocated apart from FXSTORE area on the lines of storage for ZMM(16-31) and YMM-Hi bank. There are few regressions due to extra space allocation which we are investigating.
>> 
>> 3) Replacing all the hard-coded opmask references from macro-assembly routines: Pulling out the opmask occurrences all the way up to instruction pattern and adding an unbounded opmask operand for them. This exposes these operands to RA and scheduler; this will automatically facilitate spilling of live opmask registers across call sites.
>> 
>> 4) Register class initializations related to Op_RegVMask during matcher startup.
>> 
>> 5) Handling for mask generating node:  Currently VectorMaskGen node uses a GPR to propagate mask across mask generating DEF instruction to its USER instructions. There are other mask generating nodes like VectorCmpMask, VectorLoadMask which are not handled as the part of this patch. Conditional overriding of two routines, ideal_reg and bottom_type for mask generating IDEAL nodes and modifying the instruction patterns to have new opmask operands enables instruction selector to associate opmask register class with USE/DEF operands  for such MachNodes. This will constrain  the allocation set for these operands to opmask registers(K1-K7).
>> 
>> 6) Special handling for setting a flag in PhiNode during construction in case any of its incoming node is a mask generating node, this flag is then checked to return appropriate ideal_reg and bottom_type corresponding to an opmask registers.
>> 
>> [1] : Section 15.1.3 :  https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-software-developers-manual-volume-1-basic-architecture.html
>> [2] : http://cr.openjdk.java.net/~jbhateja/avx512_masked_operation_optimization/AVX-512_RA_Opmask_Support_VectorMask_Optimizations.pdf
>
> Hi,
> 
> @XiaohongGong also posted Arm SVE predicate register allocation support in panama-vector together with other commits about vector masking support: https://github.com/openjdk/panama-vector/pull/40 last week before this PR. The predicate register allocation part has been tested for some time internally and could be separated from that PR (https://github.com/openjdk/panama-vector/pull/40/commits/e658f4d189c21dcd3668fcafee25d9b678cd3640). If it helps, we can also propose a patch here in openjdk/jdk.
> 
>> I'd like to focus high-level aspects first.
>> 
>> There's a significant amount of crux coming from the fact that masks
>> don't have their own type. Reusing TypeLong::LONG for k-registers may
>> look appealing at first, but then all the places where RegVMask matters
>> have to handle the types specially. Why not introduce a dedicated type
>> for masks?
>> 
> 
> I agree that a dedicate type sounds more reasonable, which is covered by @XiaohongGong 's patch, see: https://github.com/openjdk/panama-vector/pull/40/commits/3f69d40f08868062e2cc144b3b757dcbaa2db2d1
> 
>> Also, my understanding is AArch64/SVE allows predicate registers to be
>> larger than 64-bit, so TypeLong::LONG won't work there and a dedicated
>> representation will be needed.
>> 
> 
> Yes, AArch64/SVE predicate registers could be larger. I see in Jatin's patch, it has arch dependent type Matcher::predicate_reg_type(), that looks hacky and workable. But I would still prefer a dedicate type, which looks cleaner. Would a dedicate type also work for k-register?
> 
> Thanks,
> Ningsheng

> Second question is about x86 and different mask representations it has:
> AVX-512 introduces predicate registers, but AVX/AVX2 keep masks in wide
> vector registers. Are we fine with leaking this difference into Ideal IR
> and specifying what shape a mask value has during Ideal construction?

Yes, as @nsjian mentioned above, we added a new mask type mapped to a predicate register. Besides, to make a difference with the old vector IRs that uses vector registers for mask on other platforms, we also added a new abstract IR (`VectorMaskNode`) to represent the mask on SVE. All the mask generation IRs are extended from it. Please see codes: https://github.com/openjdk/panama-vector/pull/40/commits/3f69d40f08868062e2cc144b3b757dcbaa2db2d1 .

-------------

PR: https://git.openjdk.java.net/jdk/pull/2768