[aarch64-port-dev ] population count intrinsic performance

Edward Nevill edward.nevill at gmail.com
Wed Jun 10 20:34:48 UTC 2015


On Wed, 2015-06-10 at 18:41 +0000, Alexeev, Alexander wrote:
> Ed, I removed those 'vscratch1' & 'vscratch2' as redundant.
> Patch is below.

Ah, I see. vscratch1 & vscratch2 are just two v registers you carved out
as scratch registers from the vector register set.

But you need to let the register allocator know!

>  
> +//---------- Population Count Instructions -------------------------------------
> +//
> +
> +instruct popCountI(iRegINoSp dst,  iRegIorL2I src) %{
> +  match(Set dst (PopCountI src));
> +  ins_cost(INSN_COST * 13);
> +
> +  format %{ "TODO popCountI\n\t" %}
> +  ins_encode %{
> +    __ mov(v0, __ T1D, 0, as_Register($src$$reg));
> +    __ cnt(v1, __ T8B, v0);
> +    __ addv(v0, __ T8B, v1);
> +    __ mov(as_Register($dst$$reg), v0, __ T1D, 0);
> +  %}

So here, registers v0 and v1 might already be allocated so you cannot
just use them. Also, I don't understand why you need v0 and v1.

I think what you want is something like

instruct popCountI(iRegINoSp dst,  iRegIorL2I src, vRegD tmp) %{
  match(Set dst (PopCountI src));
  ins_cost(INSN_COST * 13);
  effect(TEMP tmp);
  format ...
  ins_encode %{
    __ mov(tmp, __ T1D, 0, as_Register($src$$reg));
    __ cnt(tmp, __ T8B, tmp);
    __ addv(tmp, __ T8B, tmp);
    __ mov(as_Register($dst$$reg), tmp, __ T1D, 0);
%}

(I haven't tried this, just typed it into the email, so there may be
typos).

> +
> +  ins_pipe(ialu_reg);
> +%}

I think this should be

  ins_pipe(pipe_class_default)

for consistency with all the other SIMD instructions for which we
haven't implemented pipeline scheduling.


> +
>  // ============================================================================
>  // MemBar Instruction
>  
> diff -r 11af3990d56c src/cpu/aarch64/vm/assembler_aarch64.hpp
> --- a/src/cpu/aarch64/vm/assembler_aarch64.hpp	Thu Jun 04 18:50:05 2015 -0700
> +++ b/src/cpu/aarch64/vm/assembler_aarch64.hpp	Wed Jun 10 18:12:27 2015 +0000
> @@ -2050,6 +2050,9 @@
>    INSN(negr,  1, 0b100000101110);
>    INSN(notr,  1, 0b100000010110);
>    INSN(addv,  0, 0b110001101110);
> +  INSN(cls,   0, 0b100000010010);
> +  INSN(clz,   1, 0b100000010010);
> +  INSN(cnt,   0, 0b100000010110);
>  
>  #undef INSN
>  
> diff -r 11af3990d56c src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
> --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp	Thu Jun 04 18:50:05 2015 -0700
> +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp	Wed Jun 10 18:12:27 2015 +0000
> @@ -36,6 +36,7 @@
>  class MacroAssembler: public Assembler {
>    friend class LIR_Assembler;
>  
> + public:
>    using Assembler::mov;
>    using Assembler::movi;

Looks fine, I think these should be public.


All the best,
Ed.



More information about the aarch64-port-dev mailing list