[vectorIntrinsics] RFR: Improve mask reduction operations on AVX [v2]

Tue Nov 9 16:28:52 UTC 2021

On Tue, 9 Nov 2021 12:03:03 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Mai Đặng Quân Anh has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - fix last true
>>  - further improvement
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4134:
> 
>> 4132:       lzcntl(dst, dst);
>> 4133:       negl(dst);
>> 4134:       addl(dst, 31);
> 
> I think from latency perspective earlier sequence was   better given that  constant move to a register is not scheduled to an execution port.

I have reverted that change with a minor change from 64-bit operations to 32-bit operations.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/158