[aarch64-port-dev ] RFR: aarch64: elide DecodeN when followed by CmpP 0

Thu Sep 24 08:30:32 UTC 2015

Hi Vladimir,

On 24/09/15 03:12, Vladimir Kozlov wrote:
> . . .
> What about Andrew Dinn response few days ago? He said next change in
> aarch64.ad fixed the problem:
> 
> bool Matcher::narrow_oop_use_complex_address() {
>   assert(UseCompressedOops, "only for compressed oops code");
>   return (LogMinObjAlignmentInBytes <= 3);
> }
> bool Matcher::narrow_klass_use_complex_address() {
>   assert(UseCompressedOops, "only for compressed oops code");
>   return (LogKlassAlignmentInBytes <= 3);
> }
> 
> From what I see in aarch64.ad it support scaled memory which is
> different from sparc:
> 
> operand indIndexScaledOffsetI(iRegP reg, iRegL lreg, immIScale scale,
> immIU12 off)
> %{
>   constraint(ALLOC_IN_RC(ptr_reg));
>   match(AddP (AddP reg (LShiftL lreg scale)) off);
> 
> Or I am missing something?

Yes, I am afraid so. We do have that operand but it only exists at this
abstract level. Due to the available addressing modes on AArch64 we
encode operations using this type of memory operand as an insn pair: an
lea (i.e. an add) followed by a ldrx with shift -- see helper function
loadStore defined in aarch64.ad and used by all the aarch64_enc_ldrx
encodings for details.

As Ed pointed out to me (privately) that means that my redefinition of
narrow_oop_use_complex_address makes some important test cases suffer
the worst of all available outcomes. In those cases we lose the implicit
null check (paying the price of an explicit compare and branch) while
still having to plant an add+ldrx pair to do the load. This turns out to
be too much of a performance hit for the redefinition to be acceptable.

Meanwhile, if we keep the old definition then when we want to do an oop
null test (as with the example Ed provided) we end up doing the
unnecessary shift.

So, I think the only way to steer between these two pessimal outcomes is
to follow his suggested patch i.e. retain the old definition /and/
include a rule which catches subgraphs in the format (If cmp (CmpP
(DecodeN oop) zero)). I suppose we might also achieve this with a
peephole optimization but Ed's solution seem sobvious.

Also, in the interest of full disclosure I'll confess that it was not me
who inserted the old definition. My original version (return false) was
revised by Andrew Haley to return Universe::narrow_oop_shift == 0. I
guess he already worked through all this and knew what he was doing :-)

regards,

Andrew Dinn
-----------