[vectorIntrinsics] RFR: Improve mask reduction operations on AVX [v2]

Jatin Bhateja jbhateja at openjdk.java.net
Tue Nov 9 12:18:56 UTC 2021


On Wed, 3 Nov 2021 07:55:57 GMT, Mai Đặng Quân Anh <duke at openjdk.java.net> wrote:

>> Hi,
>> This patch improves the logic of vector mask reduction operations on AVX, especially int, float, long, double, by using vmovmskpd and vmovmskps instructions. I also do a little refactoring to reduce duplication in toLong. The patch temporarily disables these operations on Neon, though.
>> Thank you very much.
>
> Mai Đặng Quân Anh has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - fix last true
>  - further improvement

Over all changes looks ok to me.
- Savings are majorly because we are preventing additional vector store mask.
- Tier-3/4 testing at various AVX Levels 0/1/2/3/KNL went clean.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4134:

> 4132:       lzcntl(dst, dst);
> 4133:       negl(dst);
> 4134:       addl(dst, 31);

I think from latency perspective earlier sequence was   better given that  constant move to a register is not scheduled to an execution port.

src/hotspot/cpu/x86/x86.ad line 8682:

> 8680:     __ vpmovmskb($dst$$Register, $xtmp$$XMMRegister, vlen_enc);
> 8681:     // Mask generated out of partial vector comparisons/replicate/mask manipulation
> 8682:     // operations needs to be clipped.

Please keep this comment intact.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/158


More information about the panama-dev mailing list