PING: RFR: 8164652: c1 port, code patching

Mon Oct 3 16:20:06 UTC 2016

Already replied

http://mail.openjdk.java.net/pipermail/aarch32-port-dev/2016-September/000490.html

Anton

On Mon, 2016-10-03 at 17:13 +0100, Andrew Haley wrote:
> On 29/09/16 11:51, Andrew Haley wrote:
> > 
> > There has still not been a reply to this that I can find.
> > 
> > It is a serious issue that should be addressed.
> > 
> > Andrew.
> > 
> > 
> > On 31/08/16 17:58, Andrew Haley wrote:
> > > 
> > > It all looks pretty good.
> > > 
> > > I now understand how the C1 patching code works, and it's not as bad
> > > as I'd feared.  :-)
> > > 
> > > I'm think that it'll work on existing ARM processors, given that none
> > > of the copying of code buffers which x86 does is used at all.
> > > 
> > > For the benefit of onlookers: when we have to patch, say, an
> > > instruction of the form
> > > 
> > >    mov r2, #nn
> > > 
> > > but nn is unknown at compilation time, we lay down a stub of the form
> > > 
> > > stub0:
> > >    movw r2, #0
> > >    movt r2, #0
> > >    nop
> > >    b back
> > > 
> > > and at the "call site" we do
> > > 
> > >   b <fixup>
> > >   nop
> > >   nop
> > > back:
> > > 
> > > Patching code first fixes up the stub:
> > > 
> > > stub0:
> > >    movw r2, #088
> > >    movt r2, #0
> > >    nop
> > >    b back
> > > 
> > > and then the call site to jump to the stub:
> > > 
> > >   b stub0
> > >   nop
> > >   nop
> > > back:
> > > 
> > > With all of the icache flushing done in the correct order I think this
> > > is OK.
> > > 
> > > Having said that, I want to sound a note of caution.
> > > 
> > > There is a compatibility problem in that the architecture
> > > specification says
> > > 
> > >    ...once the modified instructions are observable, each PE that is
> > >    executing the modified instructions *must* (my emphasis) issue the
> > >    following instruction to ensure execution of the modified
> > >    instructions:
> > > 
> > >          ISB;         Synchronize fetched instruction stream
> > > 
> > > But we are not doing that on all PEs.  For strict correctness we
> > > should execute an ISB on every PE after a stub has been written and
> > > before it is executed.  In practice I don't think that it matters
> > > because the instruction cache on the other PEs probably won't already
> > > contain a stale copy of the stub.  But it is possible: with a long
> > > instruction cache it may do if the line has already been fetched for
> > > an adjacent stub.  Without an ISB the PE may not re-fetch its icache
> > > line and may then execute the stale stub code.
> > > 
> > > The patching doesn't need to be done in this way.  It could have been
> > > done as
> > > 
> > >    b <fixup>
> > >    ldr r2, <constant pool + offset>
> > > 
> > > and the fixup could have initialized the constant pool entry and then
> > > overwritten the b <fixup> to a NOP .
> > > 
> > > [ As an aside: I don't think that the trampolines on AArch64 are
> > > vulnerable in the same way because the stubs are of the form
> > > 
> > >    ldr r2, data0
> > >    b r2
> > > data0:
> > >    xword 0
> > > 
> > > the patching code does not modify an instruction except for the branch
> > > at the call site, so it is legal.  Having said that, perhaps we need
> > > an acquiring load in this AArch64 stub instead of a simple LDR. ]
> > > 
> > > Andrew.
> > > 
> > 
>