RFR(m): 8220351: Cross-modifying code

Wed Mar 13 08:19:15 UTC 2019

Hi David,

On 3/12/19 9:00 AM, David Holmes wrote:
> Hi Robbin,
> 
> I have comments on the various changes below, but now I've gone through that all 
> I have a meta-comment on the overall approach.
> 
> AFAICS the only time you have to sync the i-stream is when returning to 
> _thread_in_Java, as that is the only state in which you can execute the modified 
> code - no? Handling this at only those thread-state transitions would seem a lot 
> simpler/cleaner.

I disagree, but I'll make such version after I published the update from this.
I think the model of always 'refresh' the instruction stream when leaving a
safepoint safe state is better. (but because of the S/R protocol and people do
'cheat' transition it becomes a bit clobberish)

> 
> On 9/03/2019 1:24 am, Robbin Ehn wrote:
>> Hi all, please review.
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8220351
>> Changeset:
>> http://cr.openjdk.java.net/~rehn/8220351/webrev/
> 
> src/hotspot/cpu/x86/assembler_x86.hpp
> 
> As discussed elsewhere mfence() is no longer the advised mechanism, but instead 
> a serializing instruction like cpuid must be used.

AMD64
"Upon release, the target thread must then execute a serializing instruction
such as CPUID or MFENCE (a locked operation is not sufficient) before proceeding
to the modified code to avoid executing a stale view of the instructions which
may have been speculatively fetched."

Intel
"MFENCE does not serialize the instruction stream."

CPUID seems to be common ground, changed.

> src/hotspot/cpu/x86/macroAssembler_x86.cpp
> 
> MacroAssembler::debug32 seems quite bizarre. First, pre-existing:

Reverted.

> 
> ---
> 
> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp
> 
> 2562   __ lock();
> 2563   __ addl(Address(r15_thread, JavaThread::thread_state_offset()), 1 /* 
> _thread_in_native_trans */);
> 
> This changes seems incorrect. You need a storeload barrier (at least) after the 
> update to thread_state but your lock() prefix would place barriers before I 
> think. In any case I find the existing code clearer and more obviously correct.

We describe this as <fence> <addl> <release (StoreStore|StoreLoad)>.
So it is correct.

Reverted.

> 
> ---
> 
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
> src/hotspot/os_cpu/windows_x86/orderAccess_windows_x86.hpp
> src/hotspot/share/runtime/orderAccess.cpp
> src/hotspot/share/runtime/stubRoutines.cpp
> src/hotspot/share/runtime/stubRoutines.hpp
> 
> I don't understand why we need a stub for this (or for fence() for that matter) ??

Visual studio don't support inline assembly on x64.
Changing to CPUID.

> 
> ---
> 
> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
> 
> 1090   __ lock();
> 1091   __ addl(Address(thread, JavaThread::thread_state_offset()), 1 
> /*_thread_in_native_trans*/);
> 
> Same comment as above.

Reverted.

> 
> ---
> 
> src/hotspot/os_cpu/aix_ppc/orderAccess_aix_ppc.hpp
> src/hotspot/os_cpu/linux_ppc/orderAccess_linux_ppc.hpp
> 
> PPC has isync for instruction stream synchronization, so I'd expect to see that 
> used in OrderAccess::instruction_pipeline()

I'll let PPC folks see what they need to do, but yes that was my thought also.

> 
> ---
> 
> src/hotspot/os_cpu/bsd_x86/orderAccess_bsd_x86.hpp
> src/hotspot/os_cpu/linux_x86/orderAccess_linux_x86.hpp
> src/hotspot/os_cpu/solaris_x86/orderAccess_solaris_x86.hpp
> 
> s/mfence/cpuid

Fixed.

> 
> ---
> 
> src/hotspot/os_cpu/linux_aarch64/orderAccess_linux_aarch64.hpp
> src/hotspot/os_cpu/linux_arm/orderAccess_linux_arm.hpp
> 
> Aarch64/ARM has ISB for instruction stream sync.

As Haley said, arm do not have instruction cache coherency, so ISB is not enough 
in all cases.

> 
> ---
> 
> src/hotspot/os_cpu/linux_s390/orderAccess_linux_s390.hpp
> 
> Need input from s390 folk.

Yes.

> 
> ---
> 
> src/hotspot/os_cpu/linux_sparc/orderAccess_linux_sparc.hpp
> src/hotspot/os_cpu/solaris_sparc/orderAccess_solaris_sparc.hpp
> 
> SPARC needs a FLUSH
> 
> "H.1.6 Self-Modifying Code
> If a program includes self-modifying code, it must issue a FLUSH instruction for 
> each modified doubleword of instructions (or a call to supervisor software 
> having an equivalent effect)." [SPARCV9 Architecture manual]
> 

Yes.

> ---
> 
> src/hotspot/share/classfile/classFileParser.cpp
> 
> Change is unrelated to this bug.

It's related that without it 32-bit don't compile (without disable-errors was 
warning). So I need it for test compiling. Want me to revert it?

> 
> ---
> 
> src/hotspot/share/prims/jvmtiEnv.cpp
> 
> The changes seem unnecessary. As the code comment states we are going from one 
> safepoint unsafe state to another. There is no way to execute any JIT'd code 
> before returning to Java, and that return is where we should sync if needed.

You are correct.

> 
> ---
> 
> src/hotspot/share/prims/jvmtiExport.cpp
> 
> The change is unnecessary as we are executing in the VMThread. Not only would it 
> be the one doing the code changes at a safepoint, it never executes compiled 
> Java code anyway.

Yes.

> 
> ---
> 
> src/hotspot/share/runtime/interfaceSupport.inline.hpp
> src/hotspot/share/runtime/safepoint.cpp
> 
> My initial thought was that the OrderAccess::instruction_pipeline() should be 
> buried in the safepoint blocking code, but in some cases we may not have 
> actually blocked at the safepoint and so (theoretically at least) may need to 
> sync with the i-stream in case an intervening safepoint changed some code. Can't 
> help but worry that unconditionally doing this on native transitions is going to 
> hurt JNI performance. It also looks like we will double up on the 
> instruction_pipeline() in some cases.

Yes. I had an idea with passing around a token for it, but it just clobbers the 
code more.

> 
> ---
> 
> src/hotspot/share/runtime/thread.cpp
> 
> 1839   OrderAccess::instruction_pipeline();
> 
> Can't see why this is necessary here, we should sync when returning to 
> _thread_in_Java.

In this changeset we always do the barrier when leaving a safe state.
When you are not on threads list you are safe, so you can go new->vm->java just 
missing the safepoint.

> 
> 2471   if (thread_state() != _thread_blocked) {
> 2472     OrderAccess::instruction_pipeline();
> 2473   }
> 
> Again not sure why this is necessary. The call to java_suspend_self in the 
> not-blocked case comes from JavaThread::handle_special_runtime_exit_condition, 
> which itself is called from the transition code, so I'd expect this to be done 
> in the transition code.

java_suspend_self is executed in multiple different states.
Since you are safe when your have suspend flag set, we must emit the barrier
when leaving that safe state. Transitions from unsafe to unsafe e.g. vm to java
do not emit the barrier.

> 
>       thread->set_thread_state(_thread_blocked);
>       thread->java_suspend_self();
>       thread->set_thread_state(state);
> +     OrderAccess::instruction_pipeline();
> 
> Again seems unnecessary - especially given you have one in java_suspend_self() 
> already! (Notwithstanding I think that one should be removed.)

The above one is if we are not blocked, in this case we are blocked.
The reason for this is that if we are blocked via a real transition, the 
transition does the instruction barrier. If you don't mind an extra barrier we 
can emit it unconditionally in java_suspend_self.

I think set_thread_state should be private and just the ThreadStateTransistion 
should be allowed to use set_thread_state() :)

Posting update to RFR when completed.

/Robbin

> 
> Thanks,
> David
> -----
> 
>> After a JavaThread have been in a safepoint/(handshake) safe state it can start
>> executing updated code. E.g. an oop in the instruction stream can have been
>> updated.
>>
>> Most hardware's require a barrier or that the code cross modified is far away to
>> guarantee that the thread executing the updated instruction stream sees the
>> modification.
>>
>> What far away is and how far an update instruction stream is from a safepoint
>> safe state is not clear.
>>
>> To be compliant with those architectures an instruction stream barrier must be
>> added when leaving the safepoint safe state.
>>
>> There may be crashes today due to this missing barrier.
>> A new CPU with deeper pipeline or changes to the VM which moves a safepoint safe
>> state closer to a nmethod can increase changes of a crash.
>>
>> A few benchmarks will see a performance regression with the extra serializing,
>> such as Octane-Crypto -5% (x86).
>>
>> With a much more elaborate fix, such as changing the local poll to have more
>> than two states (armed/disarmed), it would be possible to only emit such
>> instruction stream barrier when the poll have been armed while the thread was
>> safepoint safe.
>>
>> Passes t1-3 on our platforms, and test built what I can.
>>
>> Thanks, Robbin