[External] : Re: Question about RegMask::is_aligned_sets()
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Mar 12 09:36:22 UTC 2021
On 12.03.2021 01:08, Corey Ashford wrote:
> Thanks for your reply, Vladimir. A few more questions below when you
> have the chance :)
>
> On 3/11/21 12:10 AM, Vladimir Ivanov wrote:
>>
>>
>> On 11.03.2021 06:13, Corey Ashford wrote:
>>> Hello Vladimir,
>>>
>>> Currently I'm looking at the register mask defined for the Power64-LE
>>> machine. There are 64 128-bit registers defined via reg_def, e.g.:
>>>
>>> reg_def VSR25 ( SOC, SOC, Op_VecX, 25, NULL);
>>>
>>> But only the 20 of those vector registers that are declared as part
>>> of "reg_class vs_reg( ... )" end up with a mask in the generated
>>> ad_ppc_expand.cpp source file, and further, each of those registers
>>> is allocated just a single bit in the register mask:
>>>
>>> const RegMask _VS_REG_mask( 0x0, 0x0, 0x0, 0x0, 0x0, 0xfffff00, 0x0,
>>> 0x0, 0x0, 0x0 );
>>>
>>> I would have expected that since Op_VecX is a 128-bit type, it would
>>> have received four bits per register in the mask.
>>>
>>>
>>> On x86, each of the 512-bit vector registers is declared using 16
>>> Op_RegF register declarations (which makes sense - 16 x 32 = 512),
>>> but on aarch64, which can have up to a 1024-bit vector register,
>>> vector registers are declared using just 8 x Op_RegF (8 x 32 = 256
>>> bits). There is an extensive comment in the aarch64.ad about this,
>>> but it seems to imply that the 32-bits-per-slot rule is not rigid
>>> (just as on PPC64)
>>
>> As you noted, there's no relation between register mask (1 slot per
>> register definition) and ideal register (VecX et al) chosed. You had
>> to declare as many registers as there are slots in the value to get it
>> working.
>
> So we should be declaring four register slots per vector register,
> instead of one, right? I'm a bit worried as to that screwing up the
> existing implementation for vector register allocation. I will
> experiment to see what happens.
Take a look at how x86 handles that:
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L62
>
>> But, as part of SVE support, Arm came up with special VecA ideal
>> register which represents a variable-length vector >
>> Considering what you wrote about Power64-LE, I'd avoid VecA and just
>> provide additional register definitions.
>>
>>> As to your comment about not needing to use a vector register for the
>>> boolean vectors, that's quite interesting. So for all vector types
>>> except for tiny integers, I should be able to use a 64-bit bit
>>> general purpose register. I'm very new to all of this that I'm not
>>> clear how easy it will be to mix and match register types like this,
>>> but I will start experimenting with the idea.
>>
>> Can you elaborate how masks are represented in Power64-LE ISA?
>>
>
> There are no special mask/predicate registers. Masking is done via the
> uppermost bit of each corresponding element in a vector register (the
> other bits are ignored). Conversion from an array of boolean bytes of
> 0|1 is simple: the boolean bytes of the mask are first shuffled (via the
> vperm instr. on ppc64) into each element, then arithmetically negated,
> producing 0|-1, which means either all 0's or all 1's in each element,
> which effectively sets or clears the upper bit as desired.
That sounds very similar to how masks are represented in AVX/AVX2 on x86.
>> If there are special predicate registers, then you can rely on what
>> Intel and Arm folks are working on for AVX-512 and SVE support.
>>
>> On x86 AVX/AVX2 ISA, masks occupy wide vector register. So, a mask of
>> 4 elements for vector of ints (128-bit) ocupies 128-bit vector. And,
>> currently, ideal vector nodes follow that representation: mask has the
>> same ideal type as the value it is applied to (e.g., vectorx[4]{int}
>> and not a vector of 4 booleans).
>
> The mask is represented in Java is as a byte array, so there has to be a
> conversion from the byte array in memory to a vector mask (in this case
> a VecX). So it seems that conversion can done "locally" within node
> matching code in ppc.ad, and that the mask representation isn't ever
> seen as a different type for operand matching. If that's the case, the
> fog is lifting a little.
There are special ideal nodes inserted to convert vector masks between
in-memory and in-register representations: VectorLoadMask and
VectorStoreMask.
Best regards,
Vladimir Ivanov
>
> Thank you,
>
> - Corey
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>
>>> On 3/5/21 3:19 PM, Vladimir Ivanov wrote:
>>>> Hi Corey,
>>>>
>>>>> I'd like to understand the concept of "aligned sets" in RegMask. I
>>>>> believe I understand the RegMask idea overall, but I don't
>>>>> understand the idea of alignment of sets (actually the concept of
>>>>> sets in this context is also fuzzy). I've looked at the code that
>>>>> implements is_aligned_sets, and I just can't yet seem to grok what
>>>>> requirement it is trying to verify. I read RegMask.hpp's comments
>>>>> on the method protoype, and it didn't help me much, I'm afraid. If
>>>>> someone could give a paragraph or two of explanation, I'd really
>>>>> appreciate it.
>>>>
>>>> A register in RegMask is comprised of packed bits each representing
>>>> a 32-bit slot. So, a VecX register occupies 4 bits (128 = 4 x 32)
>>>> while VecZ needs 16 (512 = 16 x 32).
>>>>
>>>> Some code relies on the alignment when recovering base register from
>>>> VMReg:
>>>>
>>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/e1cad97049642ab201d53ff608937f7e7ef3ff3e/src/hotspot/cpu/x86/registerMap_x86.cpp*L29__;Iw!!GqivPVa7Brio!ONsyUV4W1ucXIXrwfbZN_XYI2Q8iF0_VwN44U7AfWOKj6jGzA9o19MVhnSL9h4n4MN_Kgz0$
>>>>
>>>>
>>>> src/hotspot/cpu/x86/registerMap_x86.cpp
>>>>
>>>> 29 address RegisterMap::pd_location(VMReg reg) const {
>>>> 30 if (reg->is_XMMRegister()) {
>>>> 31 int reg_base = reg->value() -
>>>> ConcreteRegisterImpl::max_fpr;
>>>> 32 int base_reg_enc = (reg_base /
>>>> XMMRegisterImpl::max_slots_per_register);
>>>> 33 assert(base_reg_enc >= 0 && base_reg_enc <
>>>> XMMRegisterImpl::number_of_registers, "invalid XMMRegister: %d",
>>>> base_reg_enc);
>>>> 34 VMReg base_reg = as_XMMRegister(base_reg_enc)->as_VMReg();
>>>>
>>>>> We have started working on adding support to the PPC64-LE hotspot
>>>>> code for the Vector API. In order to support Vector Masks, it
>>>>> seems we need to change our current support for fixed-length,
>>>>> 128-bit vectors to something that can be as short as two booleans.
>>>>> To do that we have changed the function min_vector_size in
>>>>> hotspot/cpu/ppc.ad to return 2 when the type is T_BOOLEAN,
>>>>> otherwise it still returns 16.
>>>>>
>>>>> My first task was to add support for vector masks, and so I added a
>>>>> new instruct to cpu/ppc/ppc.ad to match VectorLoadMask, which then
>>>>> necessitated adding some instructs for LoadVector and StoreVector
>>>>> of the appropriate lengths.
>>>>
>>>> I don't know much about PPC64-LE, but you don't have to use boolean
>>>> vectors. FTR masks have the same type as the vectors they are
>>>> applied to. Until recently (when work on predicated registers
>>>> started), it was the only mask representation in Ideal IR.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>> I have a test case that loads a vector mask for a vector of shorts:
>>>>>
>>>>> import jdk.incubator.vector.ShortVector;
>>>>> import jdk.incubator.vector.VectorSpecies;
>>>>> import jdk.incubator.vector.VectorMask;
>>>>> import java.util.Random;
>>>>>
>>>>>
>>>>> class TestVectorMaskShort {
>>>>> private static final VectorSpecies<Short> SPECIES =
>>>>> ShortVector.SPECIES_128;
>>>>>
>>>>> public static VectorMask<Short> test(boolean[] bary) {
>>>>> VectorMask<Short> vmask = VectorMask.fromArray(SPECIES,
>>>>> bary, 0);
>>>>> return vmask;
>>>>> }
>>>>>
>>>>> public static void main(String args[]) {
>>>>> Random ran = new Random(100);
>>>>> int counter = 0;
>>>>> boolean[] bary = new boolean[8];
>>>>> for (int i = 0; i < 20_000; i++) {
>>>>> for (int j = 0; j < bary.length; j++) {
>>>>> bary[j] = ran.nextBoolean();
>>>>> }
>>>>> VectorMask<Short> vmask = test(bary);
>>>>> if (vmask.allTrue()) {
>>>>> counter++;
>>>>> }
>>>>> }
>>>>> System.out.printf("counter = %d\n", counter);
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>> When I run this test case, I get a runtime error:
>>>>>
>>>>> # Internal Error
>>>>> (/home/cjashfor/git-trees/jdk/src/hotspot/share/opto/chaitin.cpp:951),
>>>>> pid=1341588, tid=1341601
>>>>> # assert(lrgmask.is_aligned_sets(RegMask::SlotsPerVecX)) failed:
>>>>> vector should be aligned
>>>>>
>>>>>
>>>>> - Corey
>>>>>
>>>>> Corey Ashford
>>>>> Software Engineer
>>>>> IBM Systems, LTC OpenJDK team
>>>>>
>>>>> IBM
More information about the hotspot-compiler-dev
mailing list