Vector API - Tail vs. Dummy vector entries

Paul Sandoz paul.sandoz at oracle.com
Tue Aug 16 19:00:04 UTC 2022


Hi,

This email list is the best way to get in contact with Vector API questions.

It’s currently the recommended approach to split the loop since we have not fully optimized masked operations with loops, where a mask loop variable has all lanes set until the last loop iteration. That requires a special loop optimization to peel off the last iteration, and depending on the architecture use a mask register or blends on the appropriate architecture (and sometimes its possible to use smaller vector sizes). It’s a non-trivial optimization.

Here’s an example from the JEP [1]:

void vectorComputation(float[] a, float[] b, float[] c) {
    for (int i = 0; i < a.length; i += SPECIES.length()) {
        // VectorMask<Float>  m;
        var m = SPECIES.indexInRange(i, a.length);
        // FloatVector va, vb, vc;
        var va = FloatVector.fromArray(SPECIES, a, i, m);
        var vb = FloatVector.fromArray(SPECIES, b, i, m);
        var vc = va.mul(va)
                   .add(vb.mul(vb))
                   .neg();
        vc.intoArray(c, i, m);
    }
}

This works well for lanewise operations. For non-trivial data parallel algorithms the perform cross lane operations more care is likely required.
 
Paul.

[1] https://openjdk.org/jeps/426

> On Aug 14, 2022, at 9:34 AM, Joachim.Schwarte at ps.rolls-royce.com wrote:
> 
> Dear Panama Team,
> 
> I found no „silent“ way to get in touch with just one of you to find out:
> 
> Why do you need a (scalar) tail to finish off a vector algorithm, when the last elements of the vector don‘t fill the whateverer sized register ?
> 
> If the assumption holds, that SIMD vector operations are „almost“ as fast as their scalar counterparts, then wouldn‘t it (statistically) be faster to just fill the gap with some dummy values, and mask away that gap before the result is provided to the receiving result array ?
> 
> On top of that, wouldn‘t it be much more elegant for those who use Vector API ?
> 
> Or do I just have to be a bit patient, because it is an interim solution ?
> 
> Best regards,
> 
> Achim
> 
> PS: Is there a more appropriate way to pose such questions ?
> 
> Rolls-Royce Power Systems and its affiliates respects the protection of your personal data. For further information, please click here for our privacy notice.<https://www.mtu-solutions.com/eu/en/legal-pages/privacy-policy.html>



More information about the panama-dev mailing list