RFR(m): 8220351: Cross-modifying code

Thu Mar 14 04:48:36 UTC 2019

On 13/03/2019 6:19 pm, Robbin Ehn wrote:
> Hi David,
> 
> On 3/12/19 9:00 AM, David Holmes wrote:
>> Hi Robbin,
>>
>> I have comments on the various changes below, but now I've gone 
>> through that all I have a meta-comment on the overall approach.
>>
>> AFAICS the only time you have to sync the i-stream is when returning 
>> to _thread_in_Java, as that is the only state in which you can execute 
>> the modified code - no? Handling this at only those thread-state 
>> transitions would seem a lot simpler/cleaner.
> 
> I disagree, but I'll make such version after I published the update from 
> this.
> I think the model of always 'refresh' the instruction stream when leaving a
> safepoint safe state is better. (but because of the S/R protocol and 
> people do 'cheat' transition it becomes a bit clobberish)

I don't see how it is better to cover cases that do not need to be 
covered. _thread_blocked is safepoint-safe while _thread_in_VM is not, 
but you don't need to sync with the i-stream when unblocking and 
returning to the VM! Only transitions back to Java need to do this. You 
then want to optimize things so that you only sync when needed.

But lets see how this plays out after you disarm experiments

Thanks,
David

>>
>> On 9/03/2019 1:24 am, Robbin Ehn wrote:
>>> Hi all, please review.
>>>
>>> Issue:
>>> https://bugs.openjdk.java.net/browse/JDK-8220351
>>> Changeset:
>>> http://cr.openjdk.java.net/~rehn/8220351/webrev/
>>
>> src/hotspot/cpu/x86/assembler_x86.hpp
>>
>> As discussed elsewhere mfence() is no longer the advised mechanism, 
>> but instead a serializing instruction like cpuid must be used.
> 
> AMD64
> "Upon release, the target thread must then execute a serializing 
> instruction
> such as CPUID or MFENCE (a locked operation is not sufficient) before 
> proceeding
> to the modified code to avoid executing a stale view of the instructions 
> which
> may have been speculatively fetched."
> 
> Intel
> "MFENCE does not serialize the instruction stream."
> 
> CPUID seems to be common ground, changed.
> 
>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>
>> MacroAssembler::debug32 seems quite bizarre. First, pre-existing:
> 
> Reverted.
> 
> 
>>
>> ---
>>
>> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp
>>
>> 2562   __ lock();
>> 2563   __ addl(Address(r15_thread, JavaThread::thread_state_offset()), 
>> 1 /* _thread_in_native_trans */);
>>
>> This changes seems incorrect. You need a storeload barrier (at least) 
>> after the update to thread_state but your lock() prefix would place 
>> barriers before I think. In any case I find the existing code clearer 
>> and more obviously correct.
> 
> We describe this as <fence> <addl> <release (StoreStore|StoreLoad)>.
> So it is correct.
> 
> Reverted.
> 
> 
>>
>> ---
>>
>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
>> src/hotspot/os_cpu/windows_x86/orderAccess_windows_x86.hpp
>> src/hotspot/share/runtime/orderAccess.cpp
>> src/hotspot/share/runtime/stubRoutines.cpp
>> src/hotspot/share/runtime/stubRoutines.hpp
>>
>> I don't understand why we need a stub for this (or for fence() for 
>> that matter) ??
> 
> Visual studio don't support inline assembly on x64.
> Changing to CPUID.
> 
>>
>> ---
>>
>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>>
>> 1090   __ lock();
>> 1091   __ addl(Address(thread, JavaThread::thread_state_offset()), 1 
>> /*_thread_in_native_trans*/);
>>
>> Same comment as above.
> 
> Reverted.
> 
>>
>> ---
>>
>> src/hotspot/os_cpu/aix_ppc/orderAccess_aix_ppc.hpp
>> src/hotspot/os_cpu/linux_ppc/orderAccess_linux_ppc.hpp
>>
>> PPC has isync for instruction stream synchronization, so I'd expect to 
>> see that used in OrderAccess::instruction_pipeline()
> 
> I'll let PPC folks see what they need to do, but yes that was my thought 
> also.
> 
>>
>> ---
>>
>> src/hotspot/os_cpu/bsd_x86/orderAccess_bsd_x86.hpp
>> src/hotspot/os_cpu/linux_x86/orderAccess_linux_x86.hpp
>> src/hotspot/os_cpu/solaris_x86/orderAccess_solaris_x86.hpp
>>
>> s/mfence/cpuid
> 
> Fixed.
> 
>>
>> ---
>>
>> src/hotspot/os_cpu/linux_aarch64/orderAccess_linux_aarch64.hpp
>> src/hotspot/os_cpu/linux_arm/orderAccess_linux_arm.hpp
>>
>> Aarch64/ARM has ISB for instruction stream sync.
> 
> As Haley said, arm do not have instruction cache coherency, so ISB is 
> not enough in all cases.
> 
>>
>> ---
>>
>> src/hotspot/os_cpu/linux_s390/orderAccess_linux_s390.hpp
>>
>> Need input from s390 folk.
> 
> Yes.
> 
>>
>> ---
>>
>> src/hotspot/os_cpu/linux_sparc/orderAccess_linux_sparc.hpp
>> src/hotspot/os_cpu/solaris_sparc/orderAccess_solaris_sparc.hpp
>>
>> SPARC needs a FLUSH
>>
>> "H.1.6 Self-Modifying Code
>> If a program includes self-modifying code, it must issue a FLUSH 
>> instruction for each modified doubleword of instructions (or a call to 
>> supervisor software having an equivalent effect)." [SPARCV9 
>> Architecture manual]
>>
> 
> Yes.
> 
>> ---
>>
>> src/hotspot/share/classfile/classFileParser.cpp
>>
>> Change is unrelated to this bug.
> 
> It's related that without it 32-bit don't compile (without 
> disable-errors was warning). So I need it for test compiling. Want me to 
> revert it?
> 
>>
>> ---
>>
>> src/hotspot/share/prims/jvmtiEnv.cpp
>>
>> The changes seem unnecessary. As the code comment states we are going 
>> from one safepoint unsafe state to another. There is no way to execute 
>> any JIT'd code before returning to Java, and that return is where we 
>> should sync if needed.
> 
> You are correct.
> 
>>
>> ---
>>
>> src/hotspot/share/prims/jvmtiExport.cpp
>>
>> The change is unnecessary as we are executing in the VMThread. Not 
>> only would it be the one doing the code changes at a safepoint, it 
>> never executes compiled Java code anyway.
> 
> Yes.
> 
>>
>> ---
>>
>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>> src/hotspot/share/runtime/safepoint.cpp
>>
>> My initial thought was that the OrderAccess::instruction_pipeline() 
>> should be buried in the safepoint blocking code, but in some cases we 
>> may not have actually blocked at the safepoint and so (theoretically 
>> at least) may need to sync with the i-stream in case an intervening 
>> safepoint changed some code. Can't help but worry that unconditionally 
>> doing this on native transitions is going to hurt JNI performance. It 
>> also looks like we will double up on the instruction_pipeline() in 
>> some cases.
> 
> Yes. I had an idea with passing around a token for it, but it just 
> clobbers the code more.
> 
>>
>> ---
>>
>> src/hotspot/share/runtime/thread.cpp
>>
>> 1839   OrderAccess::instruction_pipeline();
>>
>> Can't see why this is necessary here, we should sync when returning to 
>> _thread_in_Java.
> 
> In this changeset we always do the barrier when leaving a safe state.
> When you are not on threads list you are safe, so you can go 
> new->vm->java just missing the safepoint.
> 
>>
>> 2471   if (thread_state() != _thread_blocked) {
>> 2472     OrderAccess::instruction_pipeline();
>> 2473   }
>>
>> Again not sure why this is necessary. The call to java_suspend_self in 
>> the not-blocked case comes from 
>> JavaThread::handle_special_runtime_exit_condition, which itself is 
>> called from the transition code, so I'd expect this to be done in the 
>> transition code.
> 
> java_suspend_self is executed in multiple different states.
> Since you are safe when your have suspend flag set, we must emit the 
> barrier
> when leaving that safe state. Transitions from unsafe to unsafe e.g. vm 
> to java
> do not emit the barrier.
> 
>>
>>       thread->set_thread_state(_thread_blocked);
>>       thread->java_suspend_self();
>>       thread->set_thread_state(state);
>> +     OrderAccess::instruction_pipeline();
>>
>> Again seems unnecessary - especially given you have one in 
>> java_suspend_self() already! (Notwithstanding I think that one should 
>> be removed.)
> 
> The above one is if we are not blocked, in this case we are blocked.
> The reason for this is that if we are blocked via a real transition, the 
> transition does the instruction barrier. If you don't mind an extra 
> barrier we can emit it unconditionally in java_suspend_self.
> 
> I think set_thread_state should be private and just the 
> ThreadStateTransistion should be allowed to use set_thread_state() :)
> 
> Posting update to RFR when completed.
> 
> /Robbin
> 
>>
>> Thanks,
>> David
>> -----
>>
>>> After a JavaThread have been in a safepoint/(handshake) safe state it 
>>> can start
>>> executing updated code. E.g. an oop in the instruction stream can 
>>> have been
>>> updated.
>>>
>>> Most hardware's require a barrier or that the code cross modified is 
>>> far away to
>>> guarantee that the thread executing the updated instruction stream 
>>> sees the
>>> modification.
>>>
>>> What far away is and how far an update instruction stream is from a 
>>> safepoint
>>> safe state is not clear.
>>>
>>> To be compliant with those architectures an instruction stream 
>>> barrier must be
>>> added when leaving the safepoint safe state.
>>>
>>> There may be crashes today due to this missing barrier.
>>> A new CPU with deeper pipeline or changes to the VM which moves a 
>>> safepoint safe
>>> state closer to a nmethod can increase changes of a crash.
>>>
>>> A few benchmarks will see a performance regression with the extra 
>>> serializing,
>>> such as Octane-Crypto -5% (x86).
>>>
>>> With a much more elaborate fix, such as changing the local poll to 
>>> have more
>>> than two states (armed/disarmed), it would be possible to only emit such
>>> instruction stream barrier when the poll have been armed while the 
>>> thread was
>>> safepoint safe.
>>>
>>> Passes t1-3 on our platforms, and test built what I can.
>>>
>>> Thanks, Robbin