[vectorIntrinsics] RFR: Improve mask reduction operations on AVX [v2]
Mai Đặng Quân Anh
duke at openjdk.java.net
Tue Nov 9 16:28:52 UTC 2021
On Tue, 9 Nov 2021 12:03:03 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Mai Đặng Quân Anh has updated the pull request incrementally with two additional commits since the last revision:
>>
>> - fix last true
>> - further improvement
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4134:
>
>> 4132: lzcntl(dst, dst);
>> 4133: negl(dst);
>> 4134: addl(dst, 31);
>
> I think from latency perspective earlier sequence was better given that constant move to a register is not scheduled to an execution port.
I have reverted that change with a minor change from 64-bit operations to 32-bit operations.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/158
More information about the panama-dev
mailing list