SuperWord optimization

Mon Jan 5 11:39:15 PST 2009

I have a workspace that combines Ross's initial work supporting  
arbitrary ALU operations for vectorization with some work I did for  
adding a new RegQ type for 128-bit vector operations.  It was limping  
along when I stopped looking at it but I could post a webrev of it if  
there is interest.  It's not something I'm working on but I'd wanted  
to bring the parts forward so they didn't get completely lost.  All  
the  code generation pieces were working and the main remaining piece  
is fixing the loop alignment code to support aligning to 128 bit  
boundaries.  The heap is only aligned to 64-bits so the alignment code  
that superword uses needs to switch to a pointer alignment calculation  
instead of using index alignment.

Once we add 128 bit vectors we'll need some policy work to choose  
vector sizes based on the code we see.  Currently the code assumes  
there is only one vector register size.

tom

On Dec 31, 2008, at 8:01 PM, John Rose wrote:

> On Dec 31, 2008, at 10:32 AM, James Walsh wrote:
>
>> Is there some documentation of hdl used in x86_32.ad?  How would I
>> describe a XMM register as a whole in that language?
>
> The AD file describes machine registers in terms of 32-bit chunks.   
> Each (named) chunk corresponds to a bit position in the register  
> allocator's bitmasks.  Stack frame slots are numbered also in the  
> same uniform scheme, so it doesn't make sense to have some chunks be  
> 128 bits, etc.  That is why the AD file talks about parts [abcd] of  
> the 128-bit XMM regs.
>
> The upside of this decision to standardize on a 32-bit unit is the  
> spill and memory allocation logic (always a tricky part of register  
> allocation) is simplified.  Stack slots do not need to be typed;  
> they are just 32-bit words.  Another upside is that multi-purpose  
> machine registers (e.g., two-floats-in-a-double or a long-in-two- 
> ints) can also be represented readily.
>
> The downside of this is that 64-bit entities must be represented  
> using (contiguous, aligned) register pairs.  The register allocation  
> logic is somewhat complicated by the need to work with such register  
> pairs.  Generalizing the code further to 128-bit entities is a  
> moderately tricky problem, but I think that's what has to be done.
>
> The register allocation has an "ideal reg" query on each node which  
> discloses what kind of register to allocate.  The upshot of this  
> query is (a) a mask of possible hardware register resources where  
> the result has to live, and (b) whether the ultimate allocation is a  
> single 32-bit unit or a (contiguous, aligned) pair.  To cope with  
> vectors, the system needs at least one new "ideal reg" type per  
> vector size, perhaps an RegV4 (for XMM) or RegV8 (for YMM).  The  
> RegL probably will continue to serve well as a RegV2.
>
>> Obviously
>> MOVAPS, ADDPS, etc and the like will need to reserve a whole XMM
>> register, not just XMM0a etc.  Looking at the description of the XMM
>> double register definitions I could probably guess but if there is  
>> some
>> official docs it would be nice to take a look at them.
>
> Happy New Year!
> -- John