[vectorIntrinsics] RFR: 8283709: Add x86 back-end implementation for bit BIT_COUNT operation [v7]

Fri Apr 8 19:18:19 UTC 2022

On Thu, 7 Apr 2022 23:20:29 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   8283709: Integer.bitCount((int)(byte_val) & 0xFF) is folded as LoadUB, this is auto-vectorized as load from Boolean vector. Adding missing types in macroassembler to cover these cases.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4478:
> 
>> 4476: 
>> 4477: void C2_MacroAssembler::vector_popcount_int(XMMRegister dst, XMMRegister src, XMMRegister xtmp1,
>> 4478:                                             XMMRegister xtmp2, Register rtmp, int vec_enc) {
> 
> Comments need updating. Majority of comments above this method need to move to vector_popcount_byte. And then it is better to give comments at each step below in the method for easy review and maintenance. Please add comments to short, byte, long as well on similar lines.

New code has been modularized byte vector popcount is the leaf level operation and all the other primitive type operation build on top of it.

> src/hotspot/cpu/x86/x86.ad line 8637:
> 
>> 8635:              VM_Version::supports_avx512_vpopcntdq()) ||
>> 8636:             (is_subword_type(Matcher::vector_element_basic_type(n->in(1))) &&
>> 8637:              VM_Version::supports_avx512_bitalg()));
> 
> This sort of check is happening multiple times. Should we have a function which takes bt and tells us if vpopcnt instruction is supported for the type.

This is only in couple of predicates, so it think it will not hurt much.

> src/hotspot/cpu/x86/x86.ad line 8643:
> 
>> 8641:   format %{ "vector_popcount_integral_evex $dst, $src" %}
>> 8642:   ins_encode %{
>> 8643:     assert(UsePopCountInstruction, "not enabled");
> 
> UsePopCountInstruction is ambiguous now? subword vs int/long depend on different platform feature so cannot be combined?

It a JVM flag, fall back code is still  based on Integer.bitCount and Long.bitCount APIs,  if the flag is turned off we do not intrinsify these scalar APIs.  match_rule_supported which is called through vector inline expander is also sensitive to this flag.

> src/hotspot/cpu/x86/x86.ad line 8651:
> 
>> 8649:     // should be succeeded by its corresponding vector IR and following
>> 8650:     // special handling should be removed.
>> 8651:     if (opcode == Op_PopCountVL && Matcher::vector_element_basic_type(this) == T_INT) {
> 
> This needs clarification in comments that the behavior is different based on vector api vs auto vectorizer. Have we tested this to work appropriately in both cases?

Yes, we do have explicit bit count tests in for auto-vectorizer flow to test this flow.

> src/hotspot/cpu/x86/x86.ad line 8673:
> 
>> 8671:     BasicType bt = Matcher::vector_element_basic_type(this, $src);
>> 8672:     __ evmovdquq($dst$$XMMRegister, $src$$XMMRegister, vlen_enc);
>> 8673:     __ vector_popcount_integral_evex(bt, $dst$$XMMRegister, $src$$XMMRegister, $mask$$KRegister, true, vlen_enc);
> 
> No auto vectorizer path here where the result in int vector for long vector.

Yes, because this is predicated operation pattern and auto-vectorizer does not support predicate operation inferencing.

> src/hotspot/cpu/x86/x86.ad line 8685:
> 
>> 8683:   format %{ "vector_popcount_int  $dst, $src\t! using $xtmp1, $xtmp2 and $rtmp as TEMP" %}
>> 8684:   ins_encode %{
>> 8685:     assert(UsePopCountInstruction, "not enabled");
> 
> The assert needs to be removed on this path.

Same a above, it should be OK to keep this as is since its already used in match_rule_supported method.

> src/hotspot/cpu/x86/x86.ad line 8700:
> 
>> 8698:   format %{ "vector_popcount_long  $dst, $src\t! using $xtmp1, $xtmp2, $xtmp3, and $rtmp as TEMP" %}
>> 8699:   ins_encode %{
>> 8700:     assert(UsePopCountInstruction, "not enabled");
> 
> The assert needs to be removed on this path. Do you know why the testing didn't catch this?

Same a above, it should be OK to keep this as is since its already used in match_rule_supported method.

> src/hotspot/cpu/x86/x86.ad line 8709:
> 
>> 8707:     // special handling should be removed.
>> 8708:     if (bt == T_INT) {
>> 8709:       __ evpmovqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
> 
> why are we using evex instruction for avx path?

AVX512 targets not supporting AVX512_POPCOUNTD feature can still have 64 byte vector operation, since long pop-count was only enabled for 512 bit vectors uptill now hence a down-casting EVEX instruction worked fine.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/185