RFR(m): 8220351: Cross-modifying code

Thu Mar 14 08:24:51 UTC 2019

Hi David,

On 3/14/19 5:48 AM, David Holmes wrote:
> I don't see how it is better to cover cases that do not need to be covered. 
> _thread_blocked is safepoint-safe while _thread_in_VM is not, but you don't need 
> to sync with the i-stream when unblocking and returning to the VM! Only 
> transitions back to Java need to do this. You then want to optimize things so 
> that you only sync when needed.

I see it the other way around.
If you transition java->vm->java, you are never safe so you don't need the barrier.
Introducing more cost to the blocked state (and the orthogonal suspended state) 
I don't see any problems with and don't need any optimizations.
While the java->vm->java would need such.

/Robbin

> 
> But lets see how this plays out after you disarm experiments
> 
> Thanks,
> David
> 
>>>
>>> On 9/03/2019 1:24 am, Robbin Ehn wrote:
>>>> Hi all, please review.
>>>>
>>>> Issue:
>>>> https://bugs.openjdk.java.net/browse/JDK-8220351
>>>> Changeset:
>>>> http://cr.openjdk.java.net/~rehn/8220351/webrev/
>>>
>>> src/hotspot/cpu/x86/assembler_x86.hpp
>>>
>>> As discussed elsewhere mfence() is no longer the advised mechanism, but 
>>> instead a serializing instruction like cpuid must be used.
>>
>> AMD64
>> "Upon release, the target thread must then execute a serializing instruction
>> such as CPUID or MFENCE (a locked operation is not sufficient) before proceeding
>> to the modified code to avoid executing a stale view of the instructions which
>> may have been speculatively fetched."
>>
>> Intel
>> "MFENCE does not serialize the instruction stream."
>>
>> CPUID seems to be common ground, changed.
>>
>>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>>
>>> MacroAssembler::debug32 seems quite bizarre. First, pre-existing:
>>
>> Reverted.
>>
>>
>>>
>>> ---
>>>
>>> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp
>>>
>>> 2562   __ lock();
>>> 2563   __ addl(Address(r15_thread, JavaThread::thread_state_offset()), 1 /* 
>>> _thread_in_native_trans */);
>>>
>>> This changes seems incorrect. You need a storeload barrier (at least) after 
>>> the update to thread_state but your lock() prefix would place barriers before 
>>> I think. In any case I find the existing code clearer and more obviously 
>>> correct.
>>
>> We describe this as <fence> <addl> <release (StoreStore|StoreLoad)>.
>> So it is correct.
>>
>> Reverted.
>>
>>
>>>
>>> ---
>>>
>>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
>>> src/hotspot/os_cpu/windows_x86/orderAccess_windows_x86.hpp
>>> src/hotspot/share/runtime/orderAccess.cpp
>>> src/hotspot/share/runtime/stubRoutines.cpp
>>> src/hotspot/share/runtime/stubRoutines.hpp
>>>
>>> I don't understand why we need a stub for this (or for fence() for that 
>>> matter) ??
>>
>> Visual studio don't support inline assembly on x64.
>> Changing to CPUID.
>>
>>>
>>> ---
>>>
>>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>>>
>>> 1090   __ lock();
>>> 1091   __ addl(Address(thread, JavaThread::thread_state_offset()), 1 
>>> /*_thread_in_native_trans*/);
>>>
>>> Same comment as above.
>>
>> Reverted.
>>
>>>
>>> ---
>>>
>>> src/hotspot/os_cpu/aix_ppc/orderAccess_aix_ppc.hpp
>>> src/hotspot/os_cpu/linux_ppc/orderAccess_linux_ppc.hpp
>>>
>>> PPC has isync for instruction stream synchronization, so I'd expect to see 
>>> that used in OrderAccess::instruction_pipeline()
>>
>> I'll let PPC folks see what they need to do, but yes that was my thought also.
>>
>>>
>>> ---
>>>
>>> src/hotspot/os_cpu/bsd_x86/orderAccess_bsd_x86.hpp
>>> src/hotspot/os_cpu/linux_x86/orderAccess_linux_x86.hpp
>>> src/hotspot/os_cpu/solaris_x86/orderAccess_solaris_x86.hpp
>>>
>>> s/mfence/cpuid
>>
>> Fixed.
>>
>>>
>>> ---
>>>
>>> src/hotspot/os_cpu/linux_aarch64/orderAccess_linux_aarch64.hpp
>>> src/hotspot/os_cpu/linux_arm/orderAccess_linux_arm.hpp
>>>
>>> Aarch64/ARM has ISB for instruction stream sync.
>>
>> As Haley said, arm do not have instruction cache coherency, so ISB is not 
>> enough in all cases.
>>
>>>
>>> ---
>>>
>>> src/hotspot/os_cpu/linux_s390/orderAccess_linux_s390.hpp
>>>
>>> Need input from s390 folk.
>>
>> Yes.
>>
>>>
>>> ---
>>>
>>> src/hotspot/os_cpu/linux_sparc/orderAccess_linux_sparc.hpp
>>> src/hotspot/os_cpu/solaris_sparc/orderAccess_solaris_sparc.hpp
>>>
>>> SPARC needs a FLUSH
>>>
>>> "H.1.6 Self-Modifying Code
>>> If a program includes self-modifying code, it must issue a FLUSH instruction 
>>> for each modified doubleword of instructions (or a call to supervisor 
>>> software having an equivalent effect)." [SPARCV9 Architecture manual]
>>>
>>
>> Yes.
>>
>>> ---
>>>
>>> src/hotspot/share/classfile/classFileParser.cpp
>>>
>>> Change is unrelated to this bug.
>>
>> It's related that without it 32-bit don't compile (without disable-errors was 
>> warning). So I need it for test compiling. Want me to revert it?
>>
>>>
>>> ---
>>>
>>> src/hotspot/share/prims/jvmtiEnv.cpp
>>>
>>> The changes seem unnecessary. As the code comment states we are going from 
>>> one safepoint unsafe state to another. There is no way to execute any JIT'd 
>>> code before returning to Java, and that return is where we should sync if 
>>> needed.
>>
>> You are correct.
>>
>>>
>>> ---
>>>
>>> src/hotspot/share/prims/jvmtiExport.cpp
>>>
>>> The change is unnecessary as we are executing in the VMThread. Not only would 
>>> it be the one doing the code changes at a safepoint, it never executes 
>>> compiled Java code anyway.
>>
>> Yes.
>>
>>>
>>> ---
>>>
>>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>> src/hotspot/share/runtime/safepoint.cpp
>>>
>>> My initial thought was that the OrderAccess::instruction_pipeline() should be 
>>> buried in the safepoint blocking code, but in some cases we may not have 
>>> actually blocked at the safepoint and so (theoretically at least) may need to 
>>> sync with the i-stream in case an intervening safepoint changed some code. 
>>> Can't help but worry that unconditionally doing this on native transitions is 
>>> going to hurt JNI performance. It also looks like we will double up on the 
>>> instruction_pipeline() in some cases.
>>
>> Yes. I had an idea with passing around a token for it, but it just clobbers 
>> the code more.
>>
>>>
>>> ---
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>>
>>> 1839   OrderAccess::instruction_pipeline();
>>>
>>> Can't see why this is necessary here, we should sync when returning to 
>>> _thread_in_Java.
>>
>> In this changeset we always do the barrier when leaving a safe state.
>> When you are not on threads list you are safe, so you can go new->vm->java 
>> just missing the safepoint.
>>
>>>
>>> 2471   if (thread_state() != _thread_blocked) {
>>> 2472     OrderAccess::instruction_pipeline();
>>> 2473   }
>>>
>>> Again not sure why this is necessary. The call to java_suspend_self in the 
>>> not-blocked case comes from 
>>> JavaThread::handle_special_runtime_exit_condition, which itself is called 
>>> from the transition code, so I'd expect this to be done in the transition code.
>>
>> java_suspend_self is executed in multiple different states.
>> Since you are safe when your have suspend flag set, we must emit the barrier
>> when leaving that safe state. Transitions from unsafe to unsafe e.g. vm to java
>> do not emit the barrier.
>>
>>>
>>>       thread->set_thread_state(_thread_blocked);
>>>       thread->java_suspend_self();
>>>       thread->set_thread_state(state);
>>> +     OrderAccess::instruction_pipeline();
>>>
>>> Again seems unnecessary - especially given you have one in 
>>> java_suspend_self() already! (Notwithstanding I think that one should be 
>>> removed.)
>>
>> The above one is if we are not blocked, in this case we are blocked.
>> The reason for this is that if we are blocked via a real transition, the 
>> transition does the instruction barrier. If you don't mind an extra barrier we 
>> can emit it unconditionally in java_suspend_self.
>>
>> I think set_thread_state should be private and just the ThreadStateTransistion 
>> should be allowed to use set_thread_state() :)
>>
>> Posting update to RFR when completed.
>>
>> /Robbin
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> After a JavaThread have been in a safepoint/(handshake) safe state it can start
>>>> executing updated code. E.g. an oop in the instruction stream can have been
>>>> updated.
>>>>
>>>> Most hardware's require a barrier or that the code cross modified is far 
>>>> away to
>>>> guarantee that the thread executing the updated instruction stream sees the
>>>> modification.
>>>>
>>>> What far away is and how far an update instruction stream is from a safepoint
>>>> safe state is not clear.
>>>>
>>>> To be compliant with those architectures an instruction stream barrier must be
>>>> added when leaving the safepoint safe state.
>>>>
>>>> There may be crashes today due to this missing barrier.
>>>> A new CPU with deeper pipeline or changes to the VM which moves a safepoint 
>>>> safe
>>>> state closer to a nmethod can increase changes of a crash.
>>>>
>>>> A few benchmarks will see a performance regression with the extra serializing,
>>>> such as Octane-Crypto -5% (x86).
>>>>
>>>> With a much more elaborate fix, such as changing the local poll to have more
>>>> than two states (armed/disarmed), it would be possible to only emit such
>>>> instruction stream barrier when the poll have been armed while the thread was
>>>> safepoint safe.
>>>>
>>>> Passes t1-3 on our platforms, and test built what I can.
>>>>
>>>> Thanks, Robbin