[vectorIntrinsics] RFR: Improve mask reduction operations on AVX [v3]

Mai Đặng Quân Anh duke at openjdk.java.net
Tue Nov 16 07:59:59 UTC 2021


On Mon, 15 Nov 2021 19:15:34 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Mai Đặng Quân Anh has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - support for non-bmi, some refinement
>>  - restore VectorStoreMaskNode, move logic to backend
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4079:
> 
>> 4077:       movl(dst, -1);
>> 4078:       bsrq(tmp, tmp);
>> 4079:       cmov32(Assembler::notZero, dst, tmp);
> 
> We could use LZCNT here on platforms that support it.

Thank you very much for your reviews and suggestions. I have measured both approaches. While these provide similar performance on Intels with the `lzcnt` offers better throughput at the exchange of slightly worse latency. The situation is quite different on Amds due to the inefficiency of `bsf` in comparison with `lzcnt`. So I think using `lzcnt` where possible would be more preferable.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/158


More information about the panama-dev mailing list