RFR: 8164652: c1 port, code patching

Wed Aug 31 16:58:38 UTC 2016

It all looks pretty good.

I now understand how the C1 patching code works, and it's not as bad
as I'd feared.  :-)

I'm think that it'll work on existing ARM processors, given that none
of the copying of code buffers which x86 does is used at all.

For the benefit of onlookers: when we have to patch, say, an
instruction of the form

   mov r2, #nn

but nn is unknown at compilation time, we lay down a stub of the form

stub0:
   movw r2, #0
   movt r2, #0
   nop
   b back

and at the "call site" we do

  b <fixup>
  nop
  nop
back:

Patching code first fixes up the stub:

stub0:
   movw r2, #088
   movt r2, #0
   nop
   b back

and then the call site to jump to the stub:

  b stub0
  nop
  nop
back:

With all of the icache flushing done in the correct order I think this
is OK.

Having said that, I want to sound a note of caution.

There is a compatibility problem in that the architecture
specification says

   ...once the modified instructions are observable, each PE that is
   executing the modified instructions *must* (my emphasis) issue the
   following instruction to ensure execution of the modified
   instructions:

         ISB;         Synchronize fetched instruction stream

But we are not doing that on all PEs.  For strict correctness we
should execute an ISB on every PE after a stub has been written and
before it is executed.  In practice I don't think that it matters
because the instruction cache on the other PEs probably won't already
contain a stale copy of the stub.  But it is possible: with a long
instruction cache it may do if the line has already been fetched for
an adjacent stub.  Without an ISB the PE may not re-fetch its icache
line and may then execute the stale stub code.

The patching doesn't need to be done in this way.  It could have been
done as

   b <fixup>
   ldr r2, <constant pool + offset>

and the fixup could have initialized the constant pool entry and then
overwritten the b <fixup> to a NOP .

[ As an aside: I don't think that the trampolines on AArch64 are
vulnerable in the same way because the stubs are of the form

   ldr r2, data0
   b r2
data0:
   xword 0

the patching code does not modify an instruction except for the branch
at the call site, so it is legal.  Having said that, perhaps we need
an acquiring load in this AArch64 stub instead of a simple LDR. ]

Andrew.