[9] RFR(S): 8075324: Costs of memory operands in aarch64.ad are inconsistent

Tue Mar 17 15:15:03 UTC 2015

On 17/03/15 12:42, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch.
> 
> https://bugs.openjdk.java.net/browse/JDK-8075324
> http://cr.openjdk.java.net/~thartmann/8075324/webrev.00/

I agree one of these costs is wrong (indOffI) but otherwise I think the
costs are correct. That also means I think your addition of the two new
operands is a mistake. I suspect you may have been confused by the
rather unfortunate choice of names for the operands -- which are rather
misleading. Also, the way the instruction generation is hidden behind
some parameterised functions (ref below) makes it hard to see what is
happening here. In particular, it makes it look like your operands are
doing something that they cannot actually do.

Many of the operand definitions are supposed to have cost 0 because the
matched operands can be mapped directly to register arguments for a
single ldr/str insn (of the appropriate flavour). These are the cases where

  i) there is no indirect register and an offset in a relevant range

or where

  ii) where there is an indirect register and zero offset

If the matcher can match an operand from these cases then the cost of
the instruction combined with the complexity of the reduced operand
(added in automatically by the matching algorithm) should be all that is
needed to drive the selection process.

However, in the operand definitions where

  iii) there is an indirect register and non-zero offset

encoding of the load/store requires planting an lea (which manifests as
an add) to add the offset to the base register followed by a ldr/str
which does the scaled indirect load. That's because AArch64 does not
provide a scaled indirect with offset address mode.

For these cases the cost of the operand should be INSN_COST in order to
account precisely for that extra add and the zero cost of passing the
resulting offsetted address into the following ldr/str which does the
scaled indirect load. There is no need to prefer this sequence. If two
separate instructions can do this cheaper or at the same cost then that
is fine.

n.b. your newly provided operand format definition which prints

  ldrsbw  R0, R10, R1, #16 I2L	# byte

exemplifies the problem here. There is no instruction which does a
scaled indirect load with offset in the AArch64 instruction. If you take
look at the static methods named loadStore defined in the source block
you can see how this instruction will actually be generated.
PrintAssembly will reveal that it gets translated as follows

    add     R8, R10, #16
    ldrsbw  R0, R1 sxtw

Let's look at some cases:

You are right that indOffI and indOffL should have the same cost but
that cost should be 0 because they can both be encoded with a single
ldr/str using only a base register and offset with no indirect register.

  ldr(dst, base, disp)

By constrast, indIndexScaledI2L is correct to have cost 0. It has a
scaled index register but offset 0. So, it can be encoded using a single
ldr/str taking a scaled indirect register as argument.

  ldr(dst, base, index, SXTW)

Similarly, indIndexScaledOffsetL should have cost INSN_COST because it
will cause planting of an add followed by some flavour of ldr/str.

  add(rscratch1, base, disp)
  ldr(dst, index, LSL(0))

So, the costs should be as follows

Cost 0:

  indirect
  indIndexScaledI2L
  indIndexScaled
  indIndex
  indOffI
  indOffL
  indirectN
  indIndexScaledI2LN
  indIndexScaledN
  indIndexN
  indOffIN
  indOffLN

Cost INSN_COST:

  indIndexScaledOffsetI
  indIndexScaledOffsetL
  indIndexScaledOffsetI2L
  indIndexScaledOffsetI2LN

Your two extra operands are not needed.

regards,

Andrew Dinn
-----------