Rough prototype of a Vector API

Paul Sandoz paul.sandoz at oracle.com
Mon Feb 29 16:22:34 UTC 2016


Hi,

I pushed a really rough prototype of a Vector API. It’s an experiment to play around with code snippets, the Long{2, 4, 8} primitives, and get a sense of what is possible.

My laptop is a little old [1] so i am restricted in what i can experiment with.

At the moment there is just one fully specialised implementation, byte[] view over Long2. Essentially a Vector<Byte, Shapes.S128Bit> is a box around an instance of Long2 where some of the Vector methods defer to code snippets.

A next step for me would be to add an int[] view over Long2 specialization. I have done no performance evaluation (i expect the extra boxes will get in the way).

Here is the implementation that uses the “pshufb” instruction to shuffle bytes in a 128 bit register:

@Override
public Vector<Byte, Shapes.S128Bit> shuffle(Shuffle<Shapes.S128Bit> perm) {
    VectorByteLong2 r = new VectorByteLong2(LowLevelVectorOps.pshufb(v, perm.toLong2()));
    assert _shuffle(perm).equals(r) : r + " " + _shuffle(perm);
    return r;
}
private Vector<Byte, Shapes.S128Bit> _shuffle(Shuffle<Shapes.S128Bit> perm) {
    byte[] r = new byte[Long2.BYTES];
    int[] indexes = perm.toArray();
    for (int i = 0; i < length(); i++) {
        r[i] = this.getElement(indexes[i]);
    }
    return new VectorByteLong2(r);
}


@Override
public Mask<Shapes.S128Bit> toMask() {
    VectorBytes.Mask64bit<Shapes.S128Bit> r = new VectorBytes.Mask64bit<>(
            SPECIES, LowLevelVectorOps.pmovmskb(v));
    assert r.equals(_toMask()) : r + " " + _toMask();
    return r;
}
private Mask<Shapes.S128Bit> _toMask() {
    long mask = 0;
    for (int i = 0; i < Long2.BYTES; i++) {
        if ((this.getElement(i) & 0x80) != 0) {
            mask |= 1L << i;
        }
    }

    return new VectorBytes.Mask64bit<>(SPECIES, mask);
}


Notes:

- The Shuffle abstraction may be unnecessary, at least on x86 where the index array is commonly held in a register of the same width as the register holding the contents to be shuffled or permuted.

- Depending on what instructions are available/leveraged the mask might be represented as a max of 64 bits or in a register of the same width holding the data e.g. AVX-512 pattern vs. say pblendvb.

- For AVX-512 the combination of mask/maskz with many operations might be awkward to model unless a mask can be attached to a vector, otherwise there is potentially an explosion of methods.

- Instructions requiring immediate values have to be hard coded in a code snippet. For insertion or extraction of a byte value to/from a Vector<Byte, Shapes.S128Bit> i defer to the “pinsrb” and “pextrb” instructions respectively. I create 16 MHs for each kind :-) where the index is encoded as an immediate. This obviously does not scale, nor is it visible to the JIT.

Paul.

[1]
$ sysctl -a machdep.cpu.features
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C




More information about the panama-dev mailing list