[9] RFR(S): 8075324: Costs of memory operands in aarch64.ad are inconsistent
Tobias Hartmann
tobias.hartmann at oracle.com
Wed Mar 18 10:46:55 UTC 2015
Hi Andrew,
thanks a lot for the detailed explanation! That makes sense to me.
I was confused by the fact that the generation of the additional add in the case of a scaled-indirect-with-offset, is done 'implicitly' in loadStore and therefore we have to account for that by setting a higher cost for the corresponding memory operand.
Do you think we should still fix the cost of 'indOffI'? If so, here is the corresponding webrev:
http://cr.openjdk.java.net/~thartmann/8075324/webrev.01/
Thanks,
Tobias
On 17.03.2015 16:15, Andrew Dinn wrote:
> On 17/03/15 12:42, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8075324
>> http://cr.openjdk.java.net/~thartmann/8075324/webrev.00/
>
> I agree one of these costs is wrong (indOffI) but otherwise I think the
> costs are correct. That also means I think your addition of the two new
> operands is a mistake. I suspect you may have been confused by the
> rather unfortunate choice of names for the operands -- which are rather
> misleading. Also, the way the instruction generation is hidden behind
> some parameterised functions (ref below) makes it hard to see what is
> happening here. In particular, it makes it look like your operands are
> doing something that they cannot actually do.
>
> Many of the operand definitions are supposed to have cost 0 because the
> matched operands can be mapped directly to register arguments for a
> single ldr/str insn (of the appropriate flavour). These are the cases where
>
> i) there is no indirect register and an offset in a relevant range
>
> or where
>
> ii) where there is an indirect register and zero offset
>
> If the matcher can match an operand from these cases then the cost of
> the instruction combined with the complexity of the reduced operand
> (added in automatically by the matching algorithm) should be all that is
> needed to drive the selection process.
>
> However, in the operand definitions where
>
> iii) there is an indirect register and non-zero offset
>
> encoding of the load/store requires planting an lea (which manifests as
> an add) to add the offset to the base register followed by a ldr/str
> which does the scaled indirect load. That's because AArch64 does not
> provide a scaled indirect with offset address mode.
>
> For these cases the cost of the operand should be INSN_COST in order to
> account precisely for that extra add and the zero cost of passing the
> resulting offsetted address into the following ldr/str which does the
> scaled indirect load. There is no need to prefer this sequence. If two
> separate instructions can do this cheaper or at the same cost then that
> is fine.
>
> n.b. your newly provided operand format definition which prints
>
> ldrsbw R0, R10, R1, #16 I2L # byte
>
> exemplifies the problem here. There is no instruction which does a
> scaled indirect load with offset in the AArch64 instruction. If you take
> look at the static methods named loadStore defined in the source block
> you can see how this instruction will actually be generated.
> PrintAssembly will reveal that it gets translated as follows
>
> add R8, R10, #16
> ldrsbw R0, R1 sxtw
>
> Let's look at some cases:
>
> You are right that indOffI and indOffL should have the same cost but
> that cost should be 0 because they can both be encoded with a single
> ldr/str using only a base register and offset with no indirect register.
>
> ldr(dst, base, disp)
>
> By constrast, indIndexScaledI2L is correct to have cost 0. It has a
> scaled index register but offset 0. So, it can be encoded using a single
> ldr/str taking a scaled indirect register as argument.
>
> ldr(dst, base, index, SXTW)
>
> Similarly, indIndexScaledOffsetL should have cost INSN_COST because it
> will cause planting of an add followed by some flavour of ldr/str.
>
> add(rscratch1, base, disp)
> ldr(dst, index, LSL(0))
>
> So, the costs should be as follows
>
>
> Cost 0:
>
> indirect
> indIndexScaledI2L
> indIndexScaled
> indIndex
> indOffI
> indOffL
> indirectN
> indIndexScaledI2LN
> indIndexScaledN
> indIndexN
> indOffIN
> indOffLN
>
> Cost INSN_COST:
>
> indIndexScaledOffsetI
> indIndexScaledOffsetL
> indIndexScaledOffsetI2L
> indIndexScaledOffsetI2LN
>
> Your two extra operands are not needed.
>
> regards,
>
>
> Andrew Dinn
> -----------
>
More information about the hotspot-compiler-dev
mailing list