Vectorized latin1 equalsIgnoreCase
Eirik Bjørsnøs
eirbjo at gmail.com
Wed Mar 1 19:20:49 UTC 2023
As a follow up to this, I captured the generated code in case someone wants
to investigate the performance gap discovered in this PR, namely that using
LT, GT, EQ seems to be significantly faster than using semantically equal
LE, GE, NE operations.
Full generated code:
LE code (slower) :
https://gist.github.com/eirbjo/c164a857a32abc0f140668e39634b248
LT code (faster):
https://gist.github.com/eirbjo/f558947e8200a283eda601a5aa4905e9
A side-by-side diff does reveal some differences. Particularly, the slower
LE code has this chunk:
vpcmpgtb %ymm11, %ymm12, %ymm0
> vpcmpeqd %ymm1, %ymm1, %ymm1
> vpxor %ymm0, %ymm1, %ymm0
> vpcmpgtb %ymm7, %ymm11, %ymm1
> vpcmpeqd %ymm2, %ymm2, %ymm2
> vpxor %ymm1, %ymm2, %ymm1
> vpand %ymm1, %ymm0, %ymm14
> vpcmpgtb %ymm6, %ymm11, %ymm0
> vpcmpeqd %ymm1, %ymm1, %ymm1
> vpxor %ymm0, %ymm1, %ymm0
> vpcmpgtb %ymm11, %ymm5, %ymm1
> vpcmpeqd %ymm2, %ymm2, %ymm2
> vpxor %ymm1, %ymm2, %ymm1
> vpand %ymm1, %ymm0, %ymm0
> vpcmpeqb %ymm4, %ymm11, %ymm1
> vpcmpeqd %ymm13, %ymm13, %ymm13
> vpxor %ymm1, %ymm13, %ymm1
> vpand %ymm1, %ymm0, %ymm0
> vpor %ymm0, %ymm14, %ymm0
> vpand %ymm3, %ymm0, %ymm0
> vpor %ymm0, %ymm8, %ymm0
>
While in the similar area, the faster LT code has just this:
vpcmpgtb %ymm11, %ymm9, %ymm2
> vpand %ymm1, %ymm2, %ymm1
> vpor %ymm0, %ymm1, %ymm0
> vpand %ymm8, %ymm0, %ymm0
> vpor %ymm0, %ymm3, %ymm0
nopl (%rax)
My machine code reading skills stops here, I just wanted to capture this
in case this someone sees an obvious avenue for improvements.
Thanks,
Eirik.
On Tue, Feb 28, 2023 at 8:26 PM Viswanathan, Sandhya <
sandhya.viswanathan at intel.com> wrote:
> Hi Eirik,
>
>
>
> I have created a JBS entry based on your PR description:
>
> https://bugs.openjdk.org/browse/JDK-8303401
>
>
>
> Thanks,
>
> Sandhya
>
>
>
>
>
> *From:* Eirik Bjørsnøs <eirbjo at gmail.com>
> *Sent:* Tuesday, February 28, 2023 8:05 AM
> *To:* Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> *Cc:* panama-dev at openjdk.org
> *Subject:* Re: Vectorized latin1 equalsIgnoreCase
>
>
>
> On Fri, Feb 24, 2023 at 2:17 AM Viswanathan, Sandhya <
> sandhya.viswanathan at intel.com> wrote:
>
> Yes, it will be wonderful to add this benchmark. Please go ahead and
> create a PR.
>
>
>
> If there are objections to adding it to mainline JDK, we could fall back
> to the panama-vectror vectorIntrinsics branch.
>
>
>
> Hi Sandhya!
>
>
>
> I've created a PR here:
>
>
>
> https://github.com/openjdk/jdk/pull/12790
>
>
>
> Since I don't have access to create JBS issues, I would need help with
> that before this can proceed to review.
>
>
>
> Thanks,
>
> Eirik.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230301/fc03ef53/attachment.htm>
More information about the panama-dev
mailing list