[External] : Re: Question about RegMask::is_aligned_sets()

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Thu Mar 11 08:10:03 UTC 2021



On 11.03.2021 06:13, Corey Ashford wrote:
> Hello Vladimir,
> 
> Currently I'm looking at the register mask defined for the Power64-LE 
> machine.  There are 64 128-bit registers defined via reg_def, e.g.:
> 
> reg_def VSR25 ( SOC, SOC, Op_VecX, 25, NULL);
> 
> But only the 20 of those vector registers that are declared as part of 
> "reg_class vs_reg( ... )" end up with a mask in the generated 
> ad_ppc_expand.cpp source file, and further, each of those registers is 
> allocated just a single bit in the register mask:
> 
> const RegMask _VS_REG_mask( 0x0, 0x0, 0x0, 0x0, 0x0, 0xfffff00, 0x0, 
> 0x0, 0x0, 0x0 );
> 
> I would have expected that since Op_VecX is a 128-bit type, it would 
> have received four bits per register in the mask.
> 
> 
> On x86, each of the 512-bit vector registers is declared using 16 
> Op_RegF register declarations (which makes sense - 16 x 32 = 512), but 
> on aarch64, which can have up to a 1024-bit vector register, vector 
> registers are declared using just 8 x Op_RegF (8 x 32 = 256 bits). There 
> is an extensive comment in the aarch64.ad about this, but it seems to 
> imply that the 32-bits-per-slot rule is not rigid (just as on PPC64)

As you noted, there's no relation between register mask (1 slot per 
register definition) and ideal register (VecX et al) chosed. You had to 
declare as many registers as there are slots in the value to get it 
working. But, as part of SVE support, Arm came up with special VecA 
ideal register which represents a variable-length vector.

Considering what you wrote about Power64-LE, I'd avoid VecA and just 
provide additional register definitions.

> As to your comment about not needing to use a vector register for the 
> boolean vectors, that's quite interesting.  So for all vector types 
> except for tiny integers, I should be able to use a 64-bit bit general 
> purpose register.  I'm very new to all of this that I'm not clear how 
> easy it will be to mix and match register types like this, but I will 
> start experimenting with the idea.

Can you elaborate how masks are represented in Power64-LE ISA?

If there are special predicate registers, then you can rely on what 
Intel and Arm folks are working on for AVX-512 and SVE support.

On x86 AVX/AVX2 ISA, masks occupy wide vector register. So, a mask of 4 
elements for vector of ints (128-bit) ocupies 128-bit vector. And, 
currently, ideal vector nodes follow that representation: mask has the 
same ideal type as the value it is applied to (e.g., vectorx[4]{int} and 
not a vector of 4 booleans).

Best regards,
Vladimir Ivanov


> On 3/5/21 3:19 PM, Vladimir Ivanov wrote:
>> Hi Corey,
>>
>>> I'd like to understand the concept of "aligned sets" in RegMask.  I 
>>> believe I understand the RegMask idea overall, but I don't understand 
>>> the idea of alignment of sets (actually the concept of sets in this 
>>> context is also fuzzy).  I've looked at the code that implements 
>>> is_aligned_sets, and I just can't yet seem to grok what requirement 
>>> it is trying to verify.  I read RegMask.hpp's comments on the method 
>>> protoype, and it didn't help me much, I'm afraid.  If someone could 
>>> give a paragraph or two of explanation, I'd really appreciate it.
>>
>> A register in RegMask is comprised of packed bits each representing a 
>> 32-bit slot. So, a VecX register occupies 4 bits (128 = 4 x 32) while 
>> VecZ needs 16 (512 = 16 x 32).
>>
>> Some code relies on the alignment when recovering base register from 
>> VMReg:
>>
>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/e1cad97049642ab201d53ff608937f7e7ef3ff3e/src/hotspot/cpu/x86/registerMap_x86.cpp*L29__;Iw!!GqivPVa7Brio!ONsyUV4W1ucXIXrwfbZN_XYI2Q8iF0_VwN44U7AfWOKj6jGzA9o19MVhnSL9h4n4MN_Kgz0$ 
>>
>>
>> src/hotspot/cpu/x86/registerMap_x86.cpp
>>
>>      29 address RegisterMap::pd_location(VMReg reg) const {
>>      30   if (reg->is_XMMRegister()) {
>>      31     int reg_base = reg->value() - ConcreteRegisterImpl::max_fpr;
>>      32     int base_reg_enc = (reg_base / 
>> XMMRegisterImpl::max_slots_per_register);
>>      33     assert(base_reg_enc >= 0 && base_reg_enc < 
>> XMMRegisterImpl::number_of_registers, "invalid XMMRegister: %d", 
>> base_reg_enc);
>>      34     VMReg base_reg = as_XMMRegister(base_reg_enc)->as_VMReg();
>>
>>> We have started working on adding support to the PPC64-LE hotspot 
>>> code for the Vector API.  In order to support Vector Masks, it seems 
>>> we need to change our current support for fixed-length, 128-bit 
>>> vectors to something that can be as short as two booleans.  To do 
>>> that we have changed the function min_vector_size in 
>>> hotspot/cpu/ppc.ad to return 2 when the type is T_BOOLEAN, otherwise 
>>> it still returns 16.
>>>
>>> My first task was to add support for vector masks, and so I added a 
>>> new instruct to cpu/ppc/ppc.ad to match VectorLoadMask, which then 
>>> necessitated adding some instructs for LoadVector and StoreVector of 
>>> the appropriate lengths.
>>
>> I don't know much about PPC64-LE, but you don't have to use boolean 
>> vectors. FTR masks have the same type as the vectors they are applied 
>> to. Until recently (when work on predicated registers started), it was 
>> the only mask representation in Ideal IR.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>> I have a test case that loads a vector mask for a vector of shorts:
>>>
>>> import jdk.incubator.vector.ShortVector;
>>> import jdk.incubator.vector.VectorSpecies;
>>> import jdk.incubator.vector.VectorMask;
>>> import java.util.Random;
>>>
>>>
>>> class TestVectorMaskShort {
>>>    private static final VectorSpecies<Short> SPECIES = 
>>> ShortVector.SPECIES_128;
>>>
>>>    public static VectorMask<Short> test(boolean[] bary) {
>>>        VectorMask<Short> vmask = VectorMask.fromArray(SPECIES, bary, 0);
>>>        return vmask;
>>>    }
>>>
>>>    public static void main(String args[]) {
>>>      Random ran = new Random(100);
>>>      int counter = 0;
>>>      boolean[] bary = new boolean[8];
>>>      for (int i = 0; i < 20_000; i++) {
>>>        for (int j = 0; j < bary.length; j++) {
>>>          bary[j] = ran.nextBoolean();
>>>        }
>>>        VectorMask<Short> vmask = test(bary);
>>>        if (vmask.allTrue()) {
>>>          counter++;
>>>        }
>>>      }
>>>      System.out.printf("counter = %d\n", counter);
>>>    }
>>> }
>>>
>>>
>>> When I run this test case, I get a runtime error:
>>>
>>> #  Internal Error 
>>> (/home/cjashfor/git-trees/jdk/src/hotspot/share/opto/chaitin.cpp:951), pid=1341588, 
>>> tid=1341601
>>> #  assert(lrgmask.is_aligned_sets(RegMask::SlotsPerVecX)) failed: 
>>> vector should be aligned
>>>
>>>
>>> - Corey
>>>
>>> Corey Ashford
>>> Software Engineer
>>> IBM Systems, LTC OpenJDK team
>>>
>>> IBM


More information about the hotspot-compiler-dev mailing list