Question about RegMask::is_aligned_sets()

Thu Mar 11 03:13:51 UTC 2021

Hello Vladimir,

Currently I'm looking at the register mask defined for the Power64-LE 
machine.  There are 64 128-bit registers defined via reg_def, e.g.:

reg_def VSR25 ( SOC, SOC, Op_VecX, 25, NULL);

But only the 20 of those vector registers that are declared as part of 
"reg_class vs_reg( ... )" end up with a mask in the generated 
ad_ppc_expand.cpp source file, and further, each of those registers is 
allocated just a single bit in the register mask:

const RegMask _VS_REG_mask( 0x0, 0x0, 0x0, 0x0, 0x0, 0xfffff00, 0x0, 
0x0, 0x0, 0x0 );

I would have expected that since Op_VecX is a 128-bit type, it would 
have received four bits per register in the mask.

On x86, each of the 512-bit vector registers is declared using 16 
Op_RegF register declarations (which makes sense - 16 x 32 = 512), but 
on aarch64, which can have up to a 1024-bit vector register, vector 
registers are declared using just 8 x Op_RegF (8 x 32 = 256 bits). 
There is an extensive comment in the aarch64.ad about this, but it seems 
to imply that the 32-bits-per-slot rule is not rigid (just as on PPC64)

---

As to your comment about not needing to use a vector register for the 
boolean vectors, that's quite interesting.  So for all vector types 
except for tiny integers, I should be able to use a 64-bit bit general 
purpose register.  I'm very new to all of this that I'm not clear how 
easy it will be to mix and match register types like this, but I will 
start experimenting with the idea.

If you have any further thoughts, I'd appreciate hearing them.

Kind Regards,

- Corey

On 3/5/21 3:19 PM, Vladimir Ivanov wrote:
> Hi Corey,
> 
>> I'd like to understand the concept of "aligned sets" in RegMask.  I 
>> believe I understand the RegMask idea overall, but I don't understand 
>> the idea of alignment of sets (actually the concept of sets in this 
>> context is also fuzzy).  I've looked at the code that implements 
>> is_aligned_sets, and I just can't yet seem to grok what requirement it 
>> is trying to verify.  I read RegMask.hpp's comments on the method 
>> protoype, and it didn't help me much, I'm afraid.  If someone could 
>> give a paragraph or two of explanation, I'd really appreciate it.
> 
> A register in RegMask is comprised of packed bits each representing a 
> 32-bit slot. So, a VecX register occupies 4 bits (128 = 4 x 32) while 
> VecZ needs 16 (512 = 16 x 32).
> 
> Some code relies on the alignment when recovering base register from VMReg:
> 
> https://github.com/openjdk/jdk/blob/e1cad97049642ab201d53ff608937f7e7ef3ff3e/src/hotspot/cpu/x86/registerMap_x86.cpp#L29 
> 
> 
> src/hotspot/cpu/x86/registerMap_x86.cpp
> 
>      29 address RegisterMap::pd_location(VMReg reg) const {
>      30   if (reg->is_XMMRegister()) {
>      31     int reg_base = reg->value() - ConcreteRegisterImpl::max_fpr;
>      32     int base_reg_enc = (reg_base / 
> XMMRegisterImpl::max_slots_per_register);
>      33     assert(base_reg_enc >= 0 && base_reg_enc < 
> XMMRegisterImpl::number_of_registers, "invalid XMMRegister: %d", 
> base_reg_enc);
>      34     VMReg base_reg = as_XMMRegister(base_reg_enc)->as_VMReg();
> 
>> We have started working on adding support to the PPC64-LE hotspot code 
>> for the Vector API.  In order to support Vector Masks, it seems we 
>> need to change our current support for fixed-length, 128-bit vectors 
>> to something that can be as short as two booleans.  To do that we have 
>> changed the function min_vector_size in hotspot/cpu/ppc.ad to return 2 
>> when the type is T_BOOLEAN, otherwise it still returns 16.
>>
>> My first task was to add support for vector masks, and so I added a 
>> new instruct to cpu/ppc/ppc.ad to match VectorLoadMask, which then 
>> necessitated adding some instructs for LoadVector and StoreVector of 
>> the appropriate lengths.
> 
> I don't know much about PPC64-LE, but you don't have to use boolean 
> vectors. FTR masks have the same type as the vectors they are applied 
> to. Until recently (when work on predicated registers started), it was 
> the only mask representation in Ideal IR.
> 
> Best regards,
> Vladimir Ivanov
> 
>> I have a test case that loads a vector mask for a vector of shorts:
>>
>> import jdk.incubator.vector.ShortVector;
>> import jdk.incubator.vector.VectorSpecies;
>> import jdk.incubator.vector.VectorMask;
>> import java.util.Random;
>>
>>
>> class TestVectorMaskShort {
>>    private static final VectorSpecies<Short> SPECIES = 
>> ShortVector.SPECIES_128;
>>
>>    public static VectorMask<Short> test(boolean[] bary) {
>>        VectorMask<Short> vmask = VectorMask.fromArray(SPECIES, bary, 0);
>>        return vmask;
>>    }
>>
>>    public static void main(String args[]) {
>>      Random ran = new Random(100);
>>      int counter = 0;
>>      boolean[] bary = new boolean[8];
>>      for (int i = 0; i < 20_000; i++) {
>>        for (int j = 0; j < bary.length; j++) {
>>          bary[j] = ran.nextBoolean();
>>        }
>>        VectorMask<Short> vmask = test(bary);
>>        if (vmask.allTrue()) {
>>          counter++;
>>        }
>>      }
>>      System.out.printf("counter = %d\n", counter);
>>    }
>> }
>>
>>
>> When I run this test case, I get a runtime error:
>>
>> #  Internal Error 
>> (/home/cjashfor/git-trees/jdk/src/hotspot/share/opto/chaitin.cpp:951), 
>> pid=1341588, tid=1341601
>> #  assert(lrgmask.is_aligned_sets(RegMask::SlotsPerVecX)) failed: 
>> vector should be aligned
>>
>>
>> - Corey
>>
>> Corey Ashford
>> Software Engineer
>> IBM Systems, LTC OpenJDK team
>>
>> IBM