[ a newby question about vectorized shuffling / permutation ]

Denis Gabaydulin gabaden at gmail.com
Sun Jun 20 21:26:39 UTC 2021


If somebody is interested in the result, here's my solution (which is
twice faster on SPECIES_128).

https://gist.github.com/sherman/6d8a4103ba5bb3d2911684b8c842647f

Benchmark with the intersection in the middle of arrays.

Benchmark                                    Mode Cnt Score  Error  Units
IntersectionBenchmark.hasIntersectionScalar thrpt  10 3.785 ± 0.095 ops/us
IntersectionBenchmark.hasIntersectionVector thrpt  10 7.008 ± 0.795 ops/us

What I do:
Changed the shuffling op by the broadcast op.

Here's the most significant part in the asm:

https://gist.github.com/sherman/7788e37b667940f31531ce74cb53b4e5

Would be very appreciated any suggestions and comments on how it can
be improved in terms of throughput?

On Wed, Jun 2, 2021 at 2:04 PM Denis Gabaydulin <gabaden at gmail.com> wrote:
>
> Hi, folks!
>
> Have a newbie question about the API.
> I'm trying to explore a new and shiny vector API with simple but useful examples.
>
> One of that is to build a SIMD version to find any intersection of two sorted lists.
>
> E.g.
> hasIntersection([1, 3, 4, 10],  [2, 6, 6, 11]) => false
> hasIntersection([1, 3, 4, 10],  [2, 3, 6, 11]) => true
>
> A mental model of this algorithm which I have:
>
> 1). to get a part of an array one
> 2). compare it n-times which corresponding part of an array two.
> 3). each time we need to compare two parts of arrays we need to shuffle (permutate) a left part by a mask. E.g. 1, 3, 4, 10 -> 10, 1, 3, 4
> 4). if one of a number is the same, return true, otherwise, move.
> In SIMD instructions looks like I need two major things.
>
> 1). vectorized comparison instruction
> 2). vectorized shuffle instruction.
>
> The first one, I can see in the printed asm.
> compare in API -> vpcmpeqd
>
> The question, which function of the vector API should I use to getting something like _mm_shuffle_epi32 ?
>
>
>
>

On Wed, Jun 2, 2021 at 2:04 PM Denis Gabaydulin <gabaden at gmail.com> wrote:
>
> Hi, folks!
>
> Have a newbie question about the API.
> I'm trying to explore a new and shiny vector API with simple but useful examples.
>
> One of that is to build a SIMD version to find any intersection of two sorted lists.
>
> E.g.
> hasIntersection([1, 3, 4, 10],  [2, 6, 6, 11]) => false
> hasIntersection([1, 3, 4, 10],  [2, 3, 6, 11]) => true
>
> A mental model of this algorithm which I have:
>
> 1). to get a part of an array one
> 2). compare it n-times which corresponding part of an array two.
> 3). each time we need to compare two parts of arrays we need to shuffle (permutate) a left part by a mask. E.g. 1, 3, 4, 10 -> 10, 1, 3, 4
> 4). if one of a number is the same, return true, otherwise, move.
> In SIMD instructions looks like I need two major things.
>
> 1). vectorized comparison instruction
> 2). vectorized shuffle instruction.
>
> The first one, I can see in the printed asm.
> compare in API -> vpcmpeqd
>
> The question, which function of the vector API should I use to getting something like _mm_shuffle_epi32 ?
>
>
>
>


More information about the panama-dev mailing list