[vectorIntrinsics] RFR: Improve mask reduction operations on AVX [v3]
Mai Đặng Quân Anh
duke at openjdk.java.net
Tue Nov 16 07:59:59 UTC 2021
On Mon, 15 Nov 2021 19:15:34 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Mai Đặng Quân Anh has updated the pull request incrementally with two additional commits since the last revision:
>>
>> - support for non-bmi, some refinement
>> - restore VectorStoreMaskNode, move logic to backend
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4079:
>
>> 4077: movl(dst, -1);
>> 4078: bsrq(tmp, tmp);
>> 4079: cmov32(Assembler::notZero, dst, tmp);
>
> We could use LZCNT here on platforms that support it.
Thank you very much for your reviews and suggestions. I have measured both approaches. While these provide similar performance on Intels with the `lzcnt` offers better throughput at the exchange of slightly worse latency. The situation is quite different on Amds due to the inefficiency of `bsf` in comparison with `lzcnt`. So I think using `lzcnt` where possible would be more preferable.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/158
More information about the panama-dev
mailing list