RFR: 8283232: x86: Improve vector broadcast operations [v8]

Quan Anh Mai duke at openjdk.org
Tue Jul 26 12:52:16 UTC 2022


On Tue, 26 Jul 2022 05:53:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits:
>> 
>>  - rename
>>  - consolidate sse checks
>>  - benchmark
>>  - fix
>>  - Merge branch 'master' into improveReplicate
>>  - remove duplicate
>>  - unsignness
>>  - rematerializing input count
>>  - fix comparison
>>  - fix rematerialize, constant deduplication
>>  - ... and 8 more: https://git.openjdk.org/jdk/compare/0599a05f...6c10f9ad
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4388:
> 
>> 4386: 
>> 4387: void MacroAssembler::vallones(XMMRegister dst, int vector_len) {
>> 4388:   // vpcmpeqd has special dependency treatment so it should be preferred to vpternlogd
> 
> Comment is not clear, adding relevant reference will add more value.

I have remeasured the statement, it seems that only the non-vex encoding version receives the special dependency treatment, so I reverted this change and added a comment for clarification.

The optimisation can be found noticed in [The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers](https://www.agner.org/optimize/) on several architectures such as in section 9.8 (Register allocation and renaming in Sandy Bridge and Ivy Bridge pipeline).

I have performed measurements on uica.uops.info . While this sequence gives 1.37 cycles/iteration on Skylake and Icelake

    pcmpeqd xmm0, xmm0
    paddd xmm0, xmm1
    paddd xmm0, xmm1
    paddd xmm0, xmm1

This version has the throughput of 4 cycles/iteration

    vpcmpeqd xmm0, xmm0, xmm0
    vpaddd xmm0, xmm1, xmm0
    vpaddd xmm0, xmm1, xmm0
    vpaddd xmm0, xmm1, xmm0

Which indicates the `vpcmpeqd` failing to break dependencies on `xmm0` as opposed to the `pcmpeqd` instruction.

Thanks.

-------------

PR: https://git.openjdk.org/jdk/pull/7832


More information about the hotspot-compiler-dev mailing list