RFR: 8164652: c1 port, code patching

Thu Sep 29 10:51:45 UTC 2016

There has still not been a reply to this that I can find.

It is a serious issue that should be addressed.

Andrew.

On 31/08/16 17:58, Andrew Haley wrote:
> It all looks pretty good.
> 
> I now understand how the C1 patching code works, and it's not as bad
> as I'd feared.  :-)
> 
> I'm think that it'll work on existing ARM processors, given that none
> of the copying of code buffers which x86 does is used at all.
> 
> For the benefit of onlookers: when we have to patch, say, an
> instruction of the form
> 
>    mov r2, #nn
> 
> but nn is unknown at compilation time, we lay down a stub of the form
> 
> stub0:
>    movw r2, #0
>    movt r2, #0
>    nop
>    b back
> 
> and at the "call site" we do
> 
>   b <fixup>
>   nop
>   nop
> back:
> 
> Patching code first fixes up the stub:
> 
> stub0:
>    movw r2, #088
>    movt r2, #0
>    nop
>    b back
> 
> and then the call site to jump to the stub:
> 
>   b stub0
>   nop
>   nop
> back:
> 
> With all of the icache flushing done in the correct order I think this
> is OK.
> 
> Having said that, I want to sound a note of caution.
> 
> There is a compatibility problem in that the architecture
> specification says
> 
>    ...once the modified instructions are observable, each PE that is
>    executing the modified instructions *must* (my emphasis) issue the
>    following instruction to ensure execution of the modified
>    instructions:
> 
>          ISB;         Synchronize fetched instruction stream
> 
> But we are not doing that on all PEs.  For strict correctness we
> should execute an ISB on every PE after a stub has been written and
> before it is executed.  In practice I don't think that it matters
> because the instruction cache on the other PEs probably won't already
> contain a stale copy of the stub.  But it is possible: with a long
> instruction cache it may do if the line has already been fetched for
> an adjacent stub.  Without an ISB the PE may not re-fetch its icache
> line and may then execute the stale stub code.
> 
> The patching doesn't need to be done in this way.  It could have been
> done as
> 
>    b <fixup>
>    ldr r2, <constant pool + offset>
> 
> and the fixup could have initialized the constant pool entry and then
> overwritten the b <fixup> to a NOP .
> 
> [ As an aside: I don't think that the trampolines on AArch64 are
> vulnerable in the same way because the stubs are of the form
> 
>    ldr r2, data0
>    b r2
> data0:
>    xword 0
> 
> the patching code does not modify an instruction except for the branch
> at the call site, so it is legal.  Having said that, perhaps we need
> an acquiring load in this AArch64 stub instead of a simple LDR. ]
> 
> Andrew.
>