RFR: 8281375: Accelerate bitCount operation for AVX2 and AVX512 target. [v2]

Thu Feb 10 02:54:03 UTC 2022

On Wed, 9 Feb 2022 18:38:15 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> src/hotspot/cpu/x86/matcher_x86.hpp line 194:
>> 
>>> 192:     }
>>> 193:   }
>>> 194: 
>> 
>> Could you explain more about the meaning of "cost" here? Should it be the actual count of instruction or latency?
>
> Currently its a rough approximation of generated instruction size over X86 since its mainly used to influence SLP unrolling factor.

Thanks! My concern is that the name "vector_op_cost", which looks like very general but it only cares about size overhead at this moment. In other place maybe we need to concern the time overhead. I prefer to give the function a more specific name.

>> src/hotspot/share/opto/loopTransform.cpp line 976:
>> 
>>> 974:       case Op_PopCountVL: {
>>> 975:         const TypeVect * vt = n->bottom_type()->is_vect();
>>> 976:         body_size += Matcher::vector_op_cost(n->Opcode(), vt->element_basic_type(), vt->length());
>> 
>> For other platforms, I think `vector_op_cost` should be better to return 0 at this moment.
>
> If a target support PopCountVI/VL minimum cost should 1, Zero will mean instruction has no cost at all.

I thought the total cost has been counted by `uint body_size = _body.size()`. So that for other platforms, the total cost now is one more if counted PopCountVI/VL.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7373