[vectorIntrinsics] RFR: Improve mask reduction operations on AVX [v3]
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Mon Nov 15 23:35:44 UTC 2021
On Tue, 9 Nov 2021 16:14:26 GMT, Mai Đặng Quân Anh <duke at openjdk.java.net> wrote:
>> Hi,
>> This patch improves the logic of vector mask reduction operations on AVX, especially int, float, long, double, by using vmovmskpd and vmovmskps instructions. I also do a little refactoring to reduce duplication in toLong. The patch temporarily disables these operations on Neon, though.
>> Thank you very much.
>
> Mai Đặng Quân Anh has updated the pull request incrementally with two additional commits since the last revision:
>
> - support for non-bmi, some refinement
> - restore VectorStoreMaskNode, move logic to backend
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4079:
> 4077: movl(dst, -1);
> 4078: bsrq(tmp, tmp);
> 4079: cmov32(Assembler::notZero, dst, tmp);
We could use LZCNT here on platforms that support it.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4102:
> 4100: assert(vec_enc == AVX_128bit && VM_Version::supports_avx() ||
> 4101: vec_enc == AVX_256bit && (VM_Version::supports_avx2() || type2aelembytes(bt) >= 4), "");
> 4102:
Add an assert here that dst is same as tmp for Op_VectorMaskToLong.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4151:
> 4149: movl(dst, -1);
> 4150: bsrl(tmp, tmp);
> 4151: cmov32(Assembler::notZero, dst, tmp);
We could use LZCNT here on platforms that support it.
src/hotspot/cpu/x86/x86.ad line 8688:
> 8686: int vlen_enc = vector_length_encoding(this, $mask);
> 8687: __ vector_mask_operation(opcode, noreg, $mask$$XMMRegister, $xtmp$$XMMRegister,
> 8688: $dst$$Register, mask_len, mbt, vlen_enc);
For all the tolong instructs, it is better to pass 2nd argument also as $dst$$Register instead of noreg in vector_mask_operation.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/158
More information about the panama-dev
mailing list