Arrays.mismatch intrinsic and Vector API comparision
Kartik Ohri
kartikohri13 at gmail.com
Sat Sep 4 12:53:46 UTC 2021
Hi Paul,
Thanks for the feedback! I'll add more cases as you suggested and submit a
PR.
Regards,
Kartik
On Fri, Sep 3, 2021 at 3:41 AM Paul Sandoz <paul.sandoz at oracle.com> wrote:
> Hi Kartik,
>
> Thank you. It is useful. I am glad we are reaching the point where the
> Vector API is getting competitive with the mismatch stub.
>
> If you would like to contribute the benchmark I would be happy to review a
> PR.
> We could also measure int, float and long, in addition to small sizes,
> just above or below the vector length (not dissimilar to a mismatch on the
> the first or lower index of an array).
> When we wrote Arrays.mismatch we were very careful to measure the impact
> on small array sizes, since that method is also used to support
> Arrays.equals and we did not want to introduce a performance regression.
>
>
> Some of the difference might be explained by alignment of the arrays, some
> perhaps due to loop unrolling.
>
> I think there might be an issue with loop unrolling. It seems too
> aggressive, resulting larger than necessary nmethod sizes. We should look
> into that.
>
>
> Unsure if it's possible to to reduce [*]:
>
> vpcmpeqb %ymm1,%ymm0,%ymm0
> vpxor -0x7ad507d(%rip),%ymm0,%ymm0
>
> to:
>
> vpxor %ymm1,%ymm0,%ymm0
>
> Since the latter will not produce a valid mask representation, which could
> affect later use of the mask value (firstTrue).
>
> Paul.
>
> [*]
> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L3211
>
> > On Aug 29, 2021, at 11:36 AM, Kartik Ohri <kartikohri13 at gmail.com>
> wrote:
> >
> > Hi!
> >
> > I have started experimenting with the Vector API recently. Currently, I
> am
> > playing with the API and trying to gauge its potential by comparing its
> > performance with vectorized intrinsics implemented inside the JDK. I know
> > that Arrays.mismatch has a vectorized intrinsic so I started with it. I
> was
> > able to come up with a simple implementation for it using the Vector API.
> > The results of the JMH benchmark look quite promising, the Vector API is
> > able to come quite close to the intrinsic. This is awesome!!
> >
> > The benchmark code is available here
> > <
> https://github.com/amCap1712/curly-computing-machine/blob/main/src/main/java/dev/lucifer/benchmarks/ArrayMismatchBenchmark.java
> >
> > and
> > the complete benchmark logs are here
> > <
> https://github.com/amCap1712/curly-computing-machine/blob/main/results/array-mismatch.csv
> >.
> > I also did another run to check the assembly generated which is also
> > available here
> > <
> https://github.com/amCap1712/curly-computing-machine/blob/main/results/benchmarks.asm.log
> >.
> > It would be nice to get a sanity check on the benchmark before I proceed
> > further.
> >
> > Both versions perform more or less the same (difference is less than 5%,
> in
> > some cases the Vector API even outperforms the intrinsic). There is one
> > outlier where the Vector API is almost 35% slower than the intrinsic
> (when
> > prefix is 1 and size is 10000).
> > except for when the prefix is 1 i.e. both input arrays are equal. I see
> > that the assembly emitted by Vector API
> > <
> https://github.com/amCap1712/curly-computing-machine/blob/main/results/benchmarks.asm.log#L7436
> >
> > contains a *vpcmpeqb* but the JDK intrinsic
> > <
> https://github.com/amCap1712/curly-computing-machine/blob/main/results/benchmarks.asm.log#L2064
> >
> > does not. Looking at the implementation
> > <
> https://github.com/openjdk/panama-vector/blob/2fd7943ec191559bfb2778305daf82bcc4422028/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L6088-L6125
> >
> > of the intrinsic in the JDK, The AVX 512 version uses *vpcmpeqb* but the
> > AVX2 version does not (my machine does not have AVX 512 so it makes that
> it
> > was not emitted in the case of the intrinsic). Secondly, from the
> assembly
> > it seems that the Vector API version was unrolled but the intrinsic was
> > not. If I am right, in general loop unrolling is better so the API seems
> to
> > be doing the right thing. Hence, I am not sure why this particular case
> is
> > an outlier.
> >
> > Further, Is analysing/comparing intrinsics within the JDK to the Vector
> API
> > useful?
> >
> > Also, any other suggestions regarding contributing to the Project are
> > welcome.
> >
> > Regards,
> > Kartik
>
>
More information about the panama-dev
mailing list