[8u] RFR: nativeInstruction atomicity and correctness of patching

Fri Apr 1 10:41:12 UTC 2016

Hi, Andrew,

On Thu, 2016-03-31 at 09:49 +0100, Andrew Haley wrote:
> On 30/03/16 20:53, Anton Kozlov wrote:
> 
> > On Wed, 2016-03-30 at 16:25 +0100, Andrew Haley wrote:
> > > On 03/30/2016 04:15 PM, Anton Kozlov wrote:
> > > > ICache::invalidate_word used here not to provide coherency
> > > > between
> > > > Icache and Dcache, but to maintain in-order visibillity of code
> > > > modification.
> > > 
> > > What exactly are you trying to synchronize?
> 
> > ICache::invalidate ensures that all CPUs see modifications of code.
> > Data modification in trampoline should be visible in the same way,
> > same time as code modification would. For this ICache::invalidate
> > used, as it by implementation should synchornize dcaches among
> > CPUs.
> 
> I don't think so.  In order for this to work, the *reader* would have
> to execute some kind of fence too.  I don't think that's possible
> without an inter-processor interrupt across all cores, and I think
> that __clear_cache doesn't do that.  I certainly hope not.
> 
> > > > This is achieved by implementation of ICache::invalidate_word,
> > > > which is just call to __clear_cache syscall wrapper.
> > > 
> > > I don't think there's any point.  It's not instructions but data,
> > > so
> > > ICache::invalidate_word is inappropriate.
> > 
> > Agree, ICache is bad name for that. How about direct __clear_cache
> > call?
> 
> > Also, I will try to study implementation of __clear_cache to know,
> > how exactly it's done.
> 
> I think you will find that it does not do what you think it does. 
>  I'm
> looking at linux/arch/arm/include/asm/cacheflush.h and I can't see
> anything like that.  All I see is that do_cache_op calls
> __flush_cache_user_range, and all that does is ensure that the I and
> D
> caches are coherent.

Looks so. flush_cache_user_range calls __cpuc_coherent_user_range,
which does that. But going deeper, I will consider ARMv7.
__cpuc_coherent_user_range is v7_coherent_user_range, which does Data
cache clean DCCMVAU and Instruction cache invalidate ICIMVAU. By Table
B2-2 of ARMv7 manual, ICIMVAU invalidates all CPUs (in shareabillity
domain) icaches. (Multiprocessing extension should be available,
without it inter-processor interrupt is really required, section
B2.2.5)

ICIMVAU and DCCMVAU used also in synchonization sequence in "Concurrent
modification and execution of instructions (A3.5.4)".
It's
        DCCMVAU [instruction location] ; Clean data cache by MVA to
point of unification
        DSB ; Ensure visibility of the data cleaned from the cache
        ICIMVAU [instruction location] ; Invalidate instruction cache
by MVA to PoU
        BPIMVAU [instruction location] ; Invalidate branch predictor by
MVA to PoU
        DSB ; Ensure completion of the invalidations
for modifying site and
        ISB ; Synchronize fetched instruction stream

This explicitly names 2 concurrently executing threads. If
DCCMVAU/ICIMVAU not affects other CPU cache, then the sequence works
for 1 thread only. Did I misunderstand the manual?

And as far as I understand, without this behavior of DCCMVAU/ICIMVAU
executing site will have to flush own icache everytime it executes
instructions that may be patched, and we clearly don't do this.

However, in my kernel sources, there is no Data cache invalidation, but
I hoped it should be. Without data cache invalidation, trampoline could
load old call target regardless if __clear_cache called or not, so
really useless here, you're right. But then inconsistency remain,
changes in code are observable by other CPUs immediatelly, and changes
in trampoline target is not. I'd call DCCIMVAC (Data Cache Clean and
Invalidate), which should affect all processors in the same
shareability domain (Table B2-1), but it's unavailable in user space. I
replaced ICache::invalidate with FIXME in the patch and not sure what
should be done here.

> > But for me it looks like if we drop cache maintance in
> > NativeTrampolineCall::set_destination_mt_safe, we'll get race
> > condition, and some thread could call old destination.
> 
> Yes.  Some thread could indeed call the old destination, but the key
> is that it doesn't matter: the old routines are not evicted from the
> code cache until at a safepoint, so they can still be called. 
>  Indeed,
> they may still be running on some cores.  And we patch out the first
> instruction so that it traps.

I care more about code modifiers, like Runtime1::patch_code. After
taking Patching_lock, we are checking instr_pc. If we are 2nd thread
that came after patching occured, but our dcache have stale value for
instr_pc, then we'll perform the code patching again, and this is not
good.

Thanks,
Anton
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nativeInstrRe2.patch
Type: text/x-patch
Size: 13659 bytes
Desc: nativeInstrRe2.patch
URL: <http://mail.openjdk.java.net/pipermail/aarch32-port-dev/attachments/20160401/a17bf449/nativeInstrRe2-0001.patch>