RFR: 8283232: x86: Improve vector broadcast operations [v8]
Quan Anh Mai
duke at openjdk.org
Tue Jul 26 12:52:16 UTC 2022
On Tue, 26 Jul 2022 05:53:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits:
>>
>> - rename
>> - consolidate sse checks
>> - benchmark
>> - fix
>> - Merge branch 'master' into improveReplicate
>> - remove duplicate
>> - unsignness
>> - rematerializing input count
>> - fix comparison
>> - fix rematerialize, constant deduplication
>> - ... and 8 more: https://git.openjdk.org/jdk/compare/0599a05f...6c10f9ad
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4388:
>
>> 4386:
>> 4387: void MacroAssembler::vallones(XMMRegister dst, int vector_len) {
>> 4388: // vpcmpeqd has special dependency treatment so it should be preferred to vpternlogd
>
> Comment is not clear, adding relevant reference will add more value.
I have remeasured the statement, it seems that only the non-vex encoding version receives the special dependency treatment, so I reverted this change and added a comment for clarification.
The optimisation can be found noticed in [The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers](https://www.agner.org/optimize/) on several architectures such as in section 9.8 (Register allocation and renaming in Sandy Bridge and Ivy Bridge pipeline).
I have performed measurements on uica.uops.info . While this sequence gives 1.37 cycles/iteration on Skylake and Icelake
pcmpeqd xmm0, xmm0
paddd xmm0, xmm1
paddd xmm0, xmm1
paddd xmm0, xmm1
This version has the throughput of 4 cycles/iteration
vpcmpeqd xmm0, xmm0, xmm0
vpaddd xmm0, xmm1, xmm0
vpaddd xmm0, xmm1, xmm0
vpaddd xmm0, xmm1, xmm0
Which indicates the `vpcmpeqd` failing to break dependencies on `xmm0` as opposed to the `pcmpeqd` instruction.
Thanks.
-------------
PR: https://git.openjdk.org/jdk/pull/7832
More information about the hotspot-compiler-dev
mailing list