[8u] RFR: nativeInstruction atomicity and correctness of patching

Wed Jul 6 13:01:58 UTC 2016

Andrew, thanks for reply,

On Tue, 2016-07-05 at 19:47 +0100, Andrew Haley wrote:

> > NativeTrampolineCall::set_destination_mt_safe is not different from any
> > other
> > instruction modification methods.
> It's not wrong, but it does a pointless ICache::invalidate_word().
> It's pointless because the word at addr() + 8 isn't an instruction,
> as far as I can see.
> 
Agree. I've removed ICache::invalidate_word in NativeTrampolineCall in second
revision of the patch.

> > And eventually data change will land to other core dchache, because there is
> > fence on executing site, which you've mentioned.  
> > set_destination_mt_safe is called when PatchingLock is taken, so it's
> > followed
> > by release.
> > 
> > Executing thread could see only 2 options, since patch is word-aligned
> > single
> > word data chage:
> > 1) new data. Then everything is OK, just call new destination
> > 2) old data. Then execution stream should go to same patching code,
> > c1_Runtime,
> > which takes same PatchingLock, 
> > and therefore executes acquire. Then it should see new data, perform no
> > patching, leave patching routine and call new destination.
> There is never any problem patching a single word.  That can be done
> MT-safe if it's a word of data or it is a single patchable
> instruction.  Both of these are guaranteed: nothing else is.
> 
> This is the code that smells bad:
> 
>  void NativeGeneralJump::replace_mt_safe(address instr_addr, address
> code_buffer) {
> -  ShouldNotCallThis();
> +  // FIXME NativeCall from patching_epilog nops filling
> +  const int bytes_to_copy = NativeCall::instruction_size;
> +  const address patching_switch_addr = code_buffer + bytes_to_copy;
> +  NativeImmJump* patching_switch = NativeImmJump::from(patching_switch_addr);
> +  assert(patching_switch->destination() == patching_switch_addr +
> NativeInstruction::arm_insn_sz,
> +         "switch should be branch to next instr at this point");
> +  patching_switch->set_destination(instr_addr + bytes_to_copy);
> +  ICache::invalidate_word(patching_switch_addr);
> +
> +  NativeImmJump* nj = NativeImmJump::from(instr_addr); // checking that it is
> a jump
> +  nj->set_destination(code_buffer);
> +  ICache::invalidate_word(instr_addr);
> +
>  }
> 
> I don't think this is MT-safe.  I don't know why you call
> ICache::invalidate_word twice.
> 
> As far as I can see the only place where
> NativeGeneralJump::replace_mt_safe is used is in the C1 patching code,
> but that multiple-instruction patching relies on ordering guarantees
> we don't have on ARM.  It's hoping that if you patch things in a
> certain order they will be seen by other cores in the same order.
> 
Aha, yes, NativeGeneralJump::replace_mt_safe is tricky one.

This code assumes it's only used in C1 patching, and for our tip, it's true

C1 patching relies on abillity to replace jump to patching stub with some code
like long jump(movw+movt+bl), constant loading (movw+movt). Yes, this unallowed
to do on ARM without fences like instruction pipeline flush.
We make a trick from the fact, that PatchingStub have a copy of code that will
be overwritten with. We will make that code executable when it patching happened
by replacing method branch to patching stub with branch to patched copy. In
contrast, on x86 method branch will be replaced with patched copy itself.

You can assure that only allowed instruction modification preformed,
since NativeImmJump recognize only `b ADDR` instruction. `b ADDR1` replaced 
with `b ADDR2` here.

Actually, PatchingStub have one extra branch, compared with other platforms.
It's required because C1 itself also uses the patching copy of code as
executable in some cases. This extra branch is called patching_switch

In case 2 instruction modifications reordered on executing site, it will lead to
entry to c1 pathing routine, which does nothing if patching already occured 

I know it's hard to reason about code you can't see. In case NativeGeneralJump
still is not good, this patch could be postponed and reviewed together with C1
when published.

Thanks,
Anton