C2: Unrolling and hoisting trivial expressions out of loops

Mon Apr 30 15:58:02 UTC 2018

On 04/27/2018 08:54 PM, John Rose wrote:
> I think this behavior is controlled by Matcher::clone_address_expressions,
> which is defined in <arch>.ad.

Thanks.

Well, this is interesting.  We quite correctly return true from
clone_address_expressions for the offsets in question.  However,
unless those cloned expressions are recognized by a pattern in arch.ad
they are hoisted out of the loop after instruction selection,
presumably global code motion.  I guess that makes sense, given that
GCM simply finds the earliest block in which an instruction can be
placed.  (There is some register-pressure-sensitive logic in GCM but
it's only enabled for x86.)

An important difference between AArch64 and SPARC is that we reshape
expressions of the form

  (AddP base (AddP base address (LShiftL index con)) offset)

into

  (AddP base (AddP base offset) (LShiftL index con))

because adds with shifts have zero overhead when used in address
generation, but the have an additional cycle of latency when used in
an add instruction.  (I know, that's crazy.  It makes no sense to me
either.)  The generated instructions for

  (AddP base (AddP base address (LShiftL index con)) offset)

are, therefore,

  (Set reg (AddP base offset))
  (AddP base reg (LShiftL index con))

which would be very cool if all of those AddPs weren't hoisted and
then spilled.

If I disable this reshaping the code generation problem in this case
goes away.  So, I can add a few patterns to aarch64.ad which recognize
the full base+offset+shifted_index pattern or remove the reshaping.

Hmm.  I guess I need to do some benchmarking.  This particular
benchmark slows down by some 50%, but I don't know how common the
problem is.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671