[vectorIntrinsics] RFR: 8283709: Add x86 back-end implementation for bit BIT_COUNT operation [v7]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Thu Apr 7 23:55:20 UTC 2022


On Wed, 6 Apr 2022 18:52:46 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Summary of changes:
>> 
>> - Patch re-uses existing C2 IR nodes and re-organizes LUT based JIT code sequence of VectorOperations.BIT_COUNT operation 
>>   for sub-word type (BYTE, SHORT) vectors over X86 targets supporting AVA2 and AVX512 features.
>> - Efficient single instruction POPCOUNT instruction is emitted for applicable targets. 
>> 
>> Kindly review and share you feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8283709: Integer.bitCount((int)(byte_val) & 0xFF) is folded as LoadUB, this is auto-vectorized as load from Boolean vector. Adding missing types in macroassembler to cover these cases.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4436:

> 4434: }
> 4435: 
> 4436: void C2_MacroAssembler::vbroadcastd(XMMRegister dst, Register rtmp, int imm32, int vec_enc) {

Good to have rtmp after imm32. Basically temp registers after required inputs.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4478:

> 4476: 
> 4477: void C2_MacroAssembler::vector_popcount_int(XMMRegister dst, XMMRegister src, XMMRegister xtmp1,
> 4478:                                             XMMRegister xtmp2, Register rtmp, int vec_enc) {

Comments need updating. Majority of comments above this method need to move to vector_popcount_byte. And then it is better to give comments at each step below in the method for easy review and maintenance. Please add comments to short, byte, long as well on similar lines.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4520:

> 4518:     // above instruction sequence on 256 bit vector which
> 4519:     // can operate over maximum 4 long elements.
> 4520:     ShouldNotReachHere();

>From Vector API point of view it would be good to have the intrinsic for 256-bit/128-bit Long vectors as well.

src/hotspot/cpu/x86/x86.ad line 2063:

> 2061:       if ((bt == T_LONG) && !VM_Version::supports_avx512_vpopcntdq()) {
> 2062:         return false;
> 2063:       }

bt is T_LONG for this case, we don't need to check type.

src/hotspot/cpu/x86/x86.ad line 8637:

> 8635:              VM_Version::supports_avx512_vpopcntdq()) ||
> 8636:             (is_subword_type(Matcher::vector_element_basic_type(n->in(1))) &&
> 8637:              VM_Version::supports_avx512_bitalg()));

This sort of check is happening multiple times. Should we have a function which takes bt and tells us if vpopcnt instruction is supported for the type.

src/hotspot/cpu/x86/x86.ad line 8643:

> 8641:   format %{ "vector_popcount_integral_evex $dst, $src" %}
> 8642:   ins_encode %{
> 8643:     assert(UsePopCountInstruction, "not enabled");

UsePopCountInstruction is ambiguous now? subword vs int/long depend on different platform feature so cannot be combined?

src/hotspot/cpu/x86/x86.ad line 8651:

> 8649:     // should be succeeded by its corresponding vector IR and following
> 8650:     // special handling should be removed.
> 8651:     if (opcode == Op_PopCountVL && Matcher::vector_element_basic_type(this) == T_INT) {

This needs clarification in comments that the behavior is different based on vector api vs auto vectorizer. Have we tested this to work appropriately in both cases?

src/hotspot/cpu/x86/x86.ad line 8673:

> 8671:     BasicType bt = Matcher::vector_element_basic_type(this, $src);
> 8672:     __ evmovdquq($dst$$XMMRegister, $src$$XMMRegister, vlen_enc);
> 8673:     __ vector_popcount_integral_evex(bt, $dst$$XMMRegister, $src$$XMMRegister, $mask$$KRegister, true, vlen_enc);

No auto vectorizer path here where the result in int vector for long vector.

src/hotspot/cpu/x86/x86.ad line 8685:

> 8683:   format %{ "vector_popcount_int  $dst, $src\t! using $xtmp1, $xtmp2 and $rtmp as TEMP" %}
> 8684:   ins_encode %{
> 8685:     assert(UsePopCountInstruction, "not enabled");

The assert needs to be removed on this path.

src/hotspot/cpu/x86/x86.ad line 8700:

> 8698:   format %{ "vector_popcount_long  $dst, $src\t! using $xtmp1, $xtmp2, $xtmp3, and $rtmp as TEMP" %}
> 8699:   ins_encode %{
> 8700:     assert(UsePopCountInstruction, "not enabled");

The assert needs to be removed on this path. Do you know why the testing didn't catch this?

src/hotspot/cpu/x86/x86.ad line 8709:

> 8707:     // special handling should be removed.
> 8708:     if (bt == T_INT) {
> 8709:       __ evpmovqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);

why are we using evex instruction for avx path?

src/hotspot/share/opto/vectornode.cpp line 156:

> 154:     case T_BYTE:
> 155:     case T_SHORT:
> 156:     case T_INT:

This looks misplaced. There is no switch here.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/185


More information about the panama-dev mailing list