[vectorIntrinsics] Feedback on Vector for Image Processing

Paul Sandoz paul.sandoz at oracle.com
Tue Mar 23 15:59:37 UTC 2021


Hi Peter,

Thanks, appreciate the code+feedback it really helps focus the discussion.

I will look in more detail and respond later, but I can quickly respond on point 2).

Try this:

for(; i < SPECIES.loopBound(input.width); i += SPECIES.length() ) {
    var vinput = ByteVector.fromArray(SPECIES, input.data, indexIn+i);
    VectorMask<Byte> compare = vinput.compare(VectorOperators.LE, threshold);
    // NOTE: This will yield incorrect results because JDK doesn't support unsigned comparisions
    ByteVector.zero(SPECIES).blend(1, compare).intoArray(output.data, i);
}

Paul.

> On Mar 22, 2021, at 12:39 PM, Peter A <peter.abeles at gmail.com> wrote:
> 
> I ported a few already optimized functions related to matrix multiplication
> and image processing to the Vector API and posted the results here:
> 
> https://github.com/lessthanoptimal/VectorPerformance
> 
> Results look fairly good! In most cases performance was sped up by about
> 1.7x, in a few cases it did get worse. I'll just discuss image processing
> here since I don't think this use case has come up yet.
> 
> 1) Support for Comparison operators, support unsigned byte and unsigned
> short type. Based on comments in the JDK looks like this is planned. This
> is a critical requirement for image processing.
> 
> 2) Add support for output to the same primitive type as the input array for
> Comparison operators. Right now there's only support boolean[]. booleans
> are not ideal for image processing which is why BoofCV uses byte[] for it's
> binary images.
> 
> 3) Add a new lower level API which enables (nearly) allocation free usage.
> Forcing memory allocations inside the inner post loop kills
> performance, even if the code looks more elegant and is the Java way. This
> is especially true for code which is optimized for small arrays. You can
> see this in Linear Algebra libraries where all the highly performant ones
> are basically written like C libraries in their lowest level functions.
> Might be best to create a new thread for this comment. Could be an "easy"
> 30% performance boost.
> 
> Would also like to point out how much faster the manually unrolled image
> convolution code was than even the Vectorized version.
> 
> Cheers,
> - Peter
> 
> -- 
> "Now, now my good man, this is no time for making enemies."    — Voltaire
> (1694-1778), on his deathbed in response to a priest asking that he
> renounce Satan.



More information about the panama-dev mailing list