[aarch64-port-dev ] population count intrinsic performance
Edward Nevill
edward.nevill at gmail.com
Thu Jun 11 16:20:24 UTC 2015
On Thu, 2015-06-11 at 08:10 +0000, Alexeev, Alexander wrote:
> +
> +instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegD tmp) %{
> + match(Set dst (PopCountI src));
> + effect(TEMP tmp);
> + ins_cost(INSN_COST * 13);
> +
> + format %{ "TODO popCountI\n\t" %}
> + ins_encode %{
> + __ mov($tmp$$FloatRegister, __ T1D, 0, as_Register($src$$reg));
> + __ cnt($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister);
> + __ addv($tmp$$FloatRegister, __ T8B, $tmp$$FloatRegister);
> + __ mov(as_Register($dst$$reg), $tmp$$FloatRegister, __ T1D, 0);
> + %}
I think there may be a problem with the way 'src' is used here. You are
assuming that the top 32 bits of src are 0. However this may not be the
case if for example, src is the result of an elided L2I conversion.
See the following comment in aarch64.ad
// iRegIorL2I is used for src inputs in rules for 32 bit int (I)
// operations. it allows the src to be either an iRegI or a (ConvL2I
// iRegL). in the latter case the l2i normally planted for a ConvL2I
// can be elided because the 32-bit instruction will just employ the
// lower 32 bits anyway.
Now, what I am not clear on, is whether if you just use iRegI here
rather than iRregIorL2I you are guaranteed that the top 32 bits are 0.
All the best,
Ed.
More information about the aarch64-port-dev
mailing list