[9] RFR(S): 8075324: Costs of memory operands in aarch64.ad are inconsistent

Wed Mar 18 10:46:55 UTC 2015

Hi Andrew,

thanks a lot for the detailed explanation! That makes sense to me.

I was confused by the fact that the generation of the additional add in the case of a scaled-indirect-with-offset, is done 'implicitly' in loadStore and therefore we have to account for that by setting a higher cost for the corresponding memory operand.

Do you think we should still fix the cost of 'indOffI'? If so, here is the corresponding webrev:

http://cr.openjdk.java.net/~thartmann/8075324/webrev.01/

Thanks,
Tobias

On 17.03.2015 16:15, Andrew Dinn wrote:
> On 17/03/15 12:42, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8075324
>> http://cr.openjdk.java.net/~thartmann/8075324/webrev.00/
> 
> I agree one of these costs is wrong (indOffI) but otherwise I think the
> costs are correct. That also means I think your addition of the two new
> operands is a mistake. I suspect you may have been confused by the
> rather unfortunate choice of names for the operands -- which are rather
> misleading. Also, the way the instruction generation is hidden behind
> some parameterised functions (ref below) makes it hard to see what is
> happening here. In particular, it makes it look like your operands are
> doing something that they cannot actually do.
> 
> Many of the operand definitions are supposed to have cost 0 because the
> matched operands can be mapped directly to register arguments for a
> single ldr/str insn (of the appropriate flavour). These are the cases where
> 
>   i) there is no indirect register and an offset in a relevant range
> 
> or where
> 
>   ii) where there is an indirect register and zero offset
> 
> If the matcher can match an operand from these cases then the cost of
> the instruction combined with the complexity of the reduced operand
> (added in automatically by the matching algorithm) should be all that is
> needed to drive the selection process.
> 
> However, in the operand definitions where
> 
>   iii) there is an indirect register and non-zero offset
> 
> encoding of the load/store requires planting an lea (which manifests as
> an add) to add the offset to the base register followed by a ldr/str
> which does the scaled indirect load. That's because AArch64 does not
> provide a scaled indirect with offset address mode.
> 
> For these cases the cost of the operand should be INSN_COST in order to
> account precisely for that extra add and the zero cost of passing the
> resulting offsetted address into the following ldr/str which does the
> scaled indirect load. There is no need to prefer this sequence. If two
> separate instructions can do this cheaper or at the same cost then that
> is fine.
> 
> n.b. your newly provided operand format definition which prints
> 
>   ldrsbw  R0, R10, R1, #16 I2L	# byte
> 
> exemplifies the problem here. There is no instruction which does a
> scaled indirect load with offset in the AArch64 instruction. If you take
> look at the static methods named loadStore defined in the source block
> you can see how this instruction will actually be generated.
> PrintAssembly will reveal that it gets translated as follows
> 
>     add     R8, R10, #16
>     ldrsbw  R0, R1 sxtw
> 
> Let's look at some cases:
> 
> You are right that indOffI and indOffL should have the same cost but
> that cost should be 0 because they can both be encoded with a single
> ldr/str using only a base register and offset with no indirect register.
> 
>   ldr(dst, base, disp)
> 
> By constrast, indIndexScaledI2L is correct to have cost 0. It has a
> scaled index register but offset 0. So, it can be encoded using a single
> ldr/str taking a scaled indirect register as argument.
> 
>   ldr(dst, base, index, SXTW)
> 
> Similarly, indIndexScaledOffsetL should have cost INSN_COST because it
> will cause planting of an add followed by some flavour of ldr/str.
> 
>   add(rscratch1, base, disp)
>   ldr(dst, index, LSL(0))
> 
> So, the costs should be as follows
> 
> 
> Cost 0:
> 
>   indirect
>   indIndexScaledI2L
>   indIndexScaled
>   indIndex
>   indOffI
>   indOffL
>   indirectN
>   indIndexScaledI2LN
>   indIndexScaledN
>   indIndexN
>   indOffIN
>   indOffLN
> 
> Cost INSN_COST:
> 
>   indIndexScaledOffsetI
>   indIndexScaledOffsetL
>   indIndexScaledOffsetI2L
>   indIndexScaledOffsetI2LN
> 
> Your two extra operands are not needed.
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
>