PING: RFR: 8164652: c1 port, code patching

Mon Oct 3 16:13:36 UTC 2016

On 29/09/16 11:51, Andrew Haley wrote:
> There has still not been a reply to this that I can find.
> 
> It is a serious issue that should be addressed.
> 
> Andrew.
> 
> 
> On 31/08/16 17:58, Andrew Haley wrote:
>> It all looks pretty good.
>>
>> I now understand how the C1 patching code works, and it's not as bad
>> as I'd feared.  :-)
>>
>> I'm think that it'll work on existing ARM processors, given that none
>> of the copying of code buffers which x86 does is used at all.
>>
>> For the benefit of onlookers: when we have to patch, say, an
>> instruction of the form
>>
>>    mov r2, #nn
>>
>> but nn is unknown at compilation time, we lay down a stub of the form
>>
>> stub0:
>>    movw r2, #0
>>    movt r2, #0
>>    nop
>>    b back
>>
>> and at the "call site" we do
>>
>>   b <fixup>
>>   nop
>>   nop
>> back:
>>
>> Patching code first fixes up the stub:
>>
>> stub0:
>>    movw r2, #088
>>    movt r2, #0
>>    nop
>>    b back
>>
>> and then the call site to jump to the stub:
>>
>>   b stub0
>>   nop
>>   nop
>> back:
>>
>> With all of the icache flushing done in the correct order I think this
>> is OK.
>>
>> Having said that, I want to sound a note of caution.
>>
>> There is a compatibility problem in that the architecture
>> specification says
>>
>>    ...once the modified instructions are observable, each PE that is
>>    executing the modified instructions *must* (my emphasis) issue the
>>    following instruction to ensure execution of the modified
>>    instructions:
>>
>>          ISB;         Synchronize fetched instruction stream
>>
>> But we are not doing that on all PEs.  For strict correctness we
>> should execute an ISB on every PE after a stub has been written and
>> before it is executed.  In practice I don't think that it matters
>> because the instruction cache on the other PEs probably won't already
>> contain a stale copy of the stub.  But it is possible: with a long
>> instruction cache it may do if the line has already been fetched for
>> an adjacent stub.  Without an ISB the PE may not re-fetch its icache
>> line and may then execute the stale stub code.
>>
>> The patching doesn't need to be done in this way.  It could have been
>> done as
>>
>>    b <fixup>
>>    ldr r2, <constant pool + offset>
>>
>> and the fixup could have initialized the constant pool entry and then
>> overwritten the b <fixup> to a NOP .
>>
>> [ As an aside: I don't think that the trampolines on AArch64 are
>> vulnerable in the same way because the stubs are of the form
>>
>>    ldr r2, data0
>>    b r2
>> data0:
>>    xword 0
>>
>> the patching code does not modify an instruction except for the branch
>> at the call site, so it is legal.  Having said that, perhaps we need
>> an acquiring load in this AArch64 stub instead of a simple LDR. ]
>>
>> Andrew.
>>
>