[ a newby question about vectorized shuffling / permutation ]

Paul Sandoz paul.sandoz at oracle.com
Thu Jul 1 23:16:56 UTC 2021


Hi Denis,

Thank you for sharing.

I ran the benchmark on the tip of jdk/jdk, adding the vector shuffling implementation in all cases:

# VM options: -XX:-TieredCompilation --add-modules=jdk.incubator.vector --enable-preview
Benchmark                                                            Mode  Cnt   Score   Error   Units
IntersectionOnTheBeginningBenchmark.hasIntersectionScalar           thrpt    5  59.724 ± 1.021  ops/us
IntersectionOnTheBeginningBenchmark.hasIntersectionVector           thrpt    5  76.048 ± 2.386  ops/us
IntersectionOnTheBeginningBenchmark.hasIntersectionVectorShuffling  thrpt    5  86.275 ± 2.395  ops/us
IntersectionOnTheMiddleBenchmark.hasIntersectionScalar              thrpt    5   3.476 ± 0.080  ops/us
IntersectionOnTheMiddleBenchmark.hasIntersectionVector              thrpt    5   5.762 ± 0.076  ops/us
IntersectionOnTheMiddleBenchmark.hasIntersectionVectorShuffling     thrpt    5   6.942 ± 0.089  ops/us
NoIntersectionBenchmark.hasNoIntersectionScalar                     thrpt    5   1.425 ± 0.079  ops/us
NoIntersectionBenchmark.hasNoIntersectionVector                     thrpt    5   2.435 ± 0.100  ops/us
NoIntersectionBenchmark.hasNoIntersectionVectorShuffling            thrpt    5   3.326 ± 0.367  ops/us

Surprisingly, to me at least, your VectorShuffling implementation is better than the Vector implementation. We have made improvements since 16, but to understand more I would need to look at the generated code.

Paul.

> On Jun 26, 2021, at 6:46 AM, Denis Gabaydulin <gabaden at gmail.com> wrote:
> 
> Hi, Paul!
> 
> I made a repository with the code and benchmarks.
> Also, I attached full information about my hardware/os.
> 
> https://urldefense.com/v3/__https://github.com/sherman/simd__;!!ACWV5N9M2RV99hQ!YgFtNLO--ZKWO_I96Dpw6WLYZxJuRnk1TxZcB8MNXFWA7BHA1SgXEmwmp4uq6KVhXA$ 
> 
> On Wed, Jun 23, 2021 at 1:47 AM Paul Sandoz <paul.sandoz at oracle.com> wrote:
>> 
>> Hi Denis,
>> 
>> Thanks for sharing, perhaps you can share the complete JMH benchmark?
>> 
>> In your current approach there may be no benefit to using a vector for the right side, since you unpack the lane elements, just use the array elements directly as input to the broadcast eq.
>> 
>> I would be interested in seeing the shuffle version, if you have it. Some Shuffle/Mask stuff is still work in progress, but it could still be a good test case.
>> 
>> Paul.
>> 
>> 
>>> On Jun 20, 2021, at 2:26 PM, Denis Gabaydulin <gabaden at gmail.com> wrote:
>>> 
>>> If somebody is interested in the result, here's my solution (which is
>>> twice faster on SPECIES_128).
>>> 
>>> https://urldefense.com/v3/__https://gist.github.com/sherman/6d8a4103ba5bb3d2911684b8c842647f__;!!ACWV5N9M2RV99hQ!YgFtNLO--ZKWO_I96Dpw6WLYZxJuRnk1TxZcB8MNXFWA7BHA1SgXEmwmp4uxFt6FDw$ 
>>> 
>>> Benchmark with the intersection in the middle of arrays.
>>> 
>>> Benchmark                                    Mode Cnt Score  Error  Units
>>> IntersectionBenchmark.hasIntersectionScalar thrpt  10 3.785 ± 0.095 ops/us
>>> IntersectionBenchmark.hasIntersectionVector thrpt  10 7.008 ± 0.795 ops/us
>>> 
>>> What I do:
>>> Changed the shuffling op by the broadcast op.
>>> 
>>> Here's the most significant part in the asm:
>>> 
>>> https://urldefense.com/v3/__https://gist.github.com/sherman/7788e37b667940f31531ce74cb53b4e5__;!!ACWV5N9M2RV99hQ!YgFtNLO--ZKWO_I96Dpw6WLYZxJuRnk1TxZcB8MNXFWA7BHA1SgXEmwmp4vxdttWFA$ 
>>> 
>>> Would be very appreciated any suggestions and comments on how it can
>>> be improved in terms of throughput?
>>> 
>>> On Wed, Jun 2, 2021 at 2:04 PM Denis Gabaydulin <gabaden at gmail.com> wrote:
>>>> 
>>>> Hi, folks!
>>>> 
>>>> Have a newbie question about the API.
>>>> I'm trying to explore a new and shiny vector API with simple but useful examples.
>>>> 
>>>> One of that is to build a SIMD version to find any intersection of two sorted lists.
>>>> 
>>>> E.g.
>>>> hasIntersection([1, 3, 4, 10],  [2, 6, 6, 11]) => false
>>>> hasIntersection([1, 3, 4, 10],  [2, 3, 6, 11]) => true
>>>> 
>>>> A mental model of this algorithm which I have:
>>>> 
>>>> 1). to get a part of an array one
>>>> 2). compare it n-times which corresponding part of an array two.
>>>> 3). each time we need to compare two parts of arrays we need to shuffle (permutate) a left part by a mask. E.g. 1, 3, 4, 10 -> 10, 1, 3, 4
>>>> 4). if one of a number is the same, return true, otherwise, move.
>>>> In SIMD instructions looks like I need two major things.
>>>> 
>>>> 1). vectorized comparison instruction
>>>> 2). vectorized shuffle instruction.
>>>> 
>>>> The first one, I can see in the printed asm.
>>>> compare in API -> vpcmpeqd
>>>> 
>>>> The question, which function of the vector API should I use to getting something like _mm_shuffle_epi32 ?
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> On Wed, Jun 2, 2021 at 2:04 PM Denis Gabaydulin <gabaden at gmail.com> wrote:
>>>> 
>>>> Hi, folks!
>>>> 
>>>> Have a newbie question about the API.
>>>> I'm trying to explore a new and shiny vector API with simple but useful examples.
>>>> 
>>>> One of that is to build a SIMD version to find any intersection of two sorted lists.
>>>> 
>>>> E.g.
>>>> hasIntersection([1, 3, 4, 10],  [2, 6, 6, 11]) => false
>>>> hasIntersection([1, 3, 4, 10],  [2, 3, 6, 11]) => true
>>>> 
>>>> A mental model of this algorithm which I have:
>>>> 
>>>> 1). to get a part of an array one
>>>> 2). compare it n-times which corresponding part of an array two.
>>>> 3). each time we need to compare two parts of arrays we need to shuffle (permutate) a left part by a mask. E.g. 1, 3, 4, 10 -> 10, 1, 3, 4
>>>> 4). if one of a number is the same, return true, otherwise, move.
>>>> In SIMD instructions looks like I need two major things.
>>>> 
>>>> 1). vectorized comparison instruction
>>>> 2). vectorized shuffle instruction.
>>>> 
>>>> The first one, I can see in the printed asm.
>>>> compare in API -> vpcmpeqd
>>>> 
>>>> The question, which function of the vector API should I use to getting something like _mm_shuffle_epi32 ?



More information about the panama-dev mailing list