[vector] ARM SVE

Andrew Haley aph at redhat.com
Mon May 14 17:51:21 UTC 2018


Sorry for the slow resonse.

On 03/01/2018 08:43 PM, John Rose wrote:

> One vector shape type I very much want to see prototyped soon is the
> "loose end" shape, which is derived from a system preferred shape,
> but has an odd smaller size.  Basically, it is a system-appropriate
> vector which is derived from a standard vector, but with a suitable
> mask or count that encodes the odd bit left over after all the full
> vectors have been processed.  The vector might be either a full
> vector plus count, or else a lgN sized collection of successively
> half-sized sub-vectors, plus a final scalar.  Depends on the
> platform, but the API is simple: It finishes your loops for your.  A
> similar type (or the same in some cases) will handle the warm-up of
> loops where alignment to a multi-lane block is desirable.  Clearly
> SVE has its own take on how to do this.

SVE says you don't have to do this at all: it can often automagically
handle the case where a vector is an odd size, and will (also
automagically) create the mask for the tail as required.

Please forgive me, but I find it really hard to talk about this in a
purely abstract way.  I think I need an example to explain my point.
A really common case such as

  for (size_t i = 0; i < n; i++)
    *d++ = *s++;

requires no handling of heads, tails or even subvectors:

        cbz     x2, .L1
        mov     x3, 0               ; int i = 0
        mov     x4, x2              ; int tmp = n
        whilelo p0.s, xzr, x2       ; Set the predicate elements in p0 to TRUE
                                    ; for all elements of length >= 0

        uqdecw  x4                  ; Decrement tmp by n, the number of elements per vector
        ptrue   p1.s, all           ; Set all  predicate elements in p1 to TRUE

.L3:
        ld1w    z0.s, p0/z, [x0, x3, lsl 2]  ; Load n elements of z0 from s[n]
        st1w    z0.s, p0, [x1, x3, lsl 2]    ; Store n elements of z0 into d[n]
        whilelo p0.s, x3, x4                 ; Set the predicate elements in p0 to TRUE
                                             ; for all elements of length >= tmp

        incw    x3                  ; increment x4 by the number of elements per vector
        ptest   p1, p0.b            ; Test p0, setting flags
        bne     .L3
.L1:

> The Vector API seems to be tolerant of multiple level of
> abstraction, so we can play games like that.

Mmm.  The ideal model from SVE's point of view is simply an IntVector
or somesuch.  And it seems to me that form the point of view of ease
of use, maintainability, and so on, that's am easier model for
programmers to think about.

> It may even be possible to build mega-Vectors, in the same API or a
> variant, which have the VCODE like property of large data dependent
> sizes (and masks).  (And permutations.  At full-problem sizes, a
> shuffle turns into a routing problem, with potential reductions at
> collision points.  A very rich parallel computing paradigm.)  Such a
> mega-vector has a close correspondence to today's streams,

Yes, exactly.  I'd love a flexible vector type, much like a stream, at the
Java level so that our programmers don't have to worry about heads and
tails but can just use operations on vectors.

> As an accident of history, our Intel friends have helped us to do
> much of our portability work just on x64, by providing several
> different vector architectures to port to, all in one convenient
> place.

I think that perhaps ARM needs to have some skin in this game.  They
have developed techniques to generate efficient code for SVE, and it
would be very useful to have some input from them.  I'm not absolutely
convinced that something like SVE is necessarily the way to go, but it
looks attractive to me, and coding vectors at a higher (stream-like)
level sounds like something I'd enjoy a lot more than messing with
heads, tails, and alignment.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the panama-dev mailing list