[aarch64-port-dev ] population count intrinsic performance
Edward Nevill
edward.nevill at gmail.com
Wed Jun 10 20:34:48 UTC 2015
On Wed, 2015-06-10 at 18:41 +0000, Alexeev, Alexander wrote:
> Ed, I removed those 'vscratch1' & 'vscratch2' as redundant.
> Patch is below.
Ah, I see. vscratch1 & vscratch2 are just two v registers you carved out
as scratch registers from the vector register set.
But you need to let the register allocator know!
>
> +//---------- Population Count Instructions -------------------------------------
> +//
> +
> +instruct popCountI(iRegINoSp dst, iRegIorL2I src) %{
> + match(Set dst (PopCountI src));
> + ins_cost(INSN_COST * 13);
> +
> + format %{ "TODO popCountI\n\t" %}
> + ins_encode %{
> + __ mov(v0, __ T1D, 0, as_Register($src$$reg));
> + __ cnt(v1, __ T8B, v0);
> + __ addv(v0, __ T8B, v1);
> + __ mov(as_Register($dst$$reg), v0, __ T1D, 0);
> + %}
So here, registers v0 and v1 might already be allocated so you cannot
just use them. Also, I don't understand why you need v0 and v1.
I think what you want is something like
instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegD tmp) %{
match(Set dst (PopCountI src));
ins_cost(INSN_COST * 13);
effect(TEMP tmp);
format ...
ins_encode %{
__ mov(tmp, __ T1D, 0, as_Register($src$$reg));
__ cnt(tmp, __ T8B, tmp);
__ addv(tmp, __ T8B, tmp);
__ mov(as_Register($dst$$reg), tmp, __ T1D, 0);
%}
(I haven't tried this, just typed it into the email, so there may be
typos).
> +
> + ins_pipe(ialu_reg);
> +%}
I think this should be
ins_pipe(pipe_class_default)
for consistency with all the other SIMD instructions for which we
haven't implemented pipeline scheduling.
> +
> // ============================================================================
> // MemBar Instruction
>
> diff -r 11af3990d56c src/cpu/aarch64/vm/assembler_aarch64.hpp
> --- a/src/cpu/aarch64/vm/assembler_aarch64.hpp Thu Jun 04 18:50:05 2015 -0700
> +++ b/src/cpu/aarch64/vm/assembler_aarch64.hpp Wed Jun 10 18:12:27 2015 +0000
> @@ -2050,6 +2050,9 @@
> INSN(negr, 1, 0b100000101110);
> INSN(notr, 1, 0b100000010110);
> INSN(addv, 0, 0b110001101110);
> + INSN(cls, 0, 0b100000010010);
> + INSN(clz, 1, 0b100000010010);
> + INSN(cnt, 0, 0b100000010110);
>
> #undef INSN
>
> diff -r 11af3990d56c src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
> --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Thu Jun 04 18:50:05 2015 -0700
> +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Wed Jun 10 18:12:27 2015 +0000
> @@ -36,6 +36,7 @@
> class MacroAssembler: public Assembler {
> friend class LIR_Assembler;
>
> + public:
> using Assembler::mov;
> using Assembler::movi;
Looks fine, I think these should be public.
All the best,
Ed.
More information about the aarch64-port-dev
mailing list