RFR(m): 8220351: Cross-modifying code

Tue Mar 12 08:00:04 UTC 2019

Hi Robbin,

I have comments on the various changes below, but now I've gone through 
that all I have a meta-comment on the overall approach.

AFAICS the only time you have to sync the i-stream is when returning to 
_thread_in_Java, as that is the only state in which you can execute the 
modified code - no? Handling this at only those thread-state transitions 
would seem a lot simpler/cleaner.

On 9/03/2019 1:24 am, Robbin Ehn wrote:
> Hi all, please review.
> 
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8220351
> Changeset:
> http://cr.openjdk.java.net/~rehn/8220351/webrev/

src/hotspot/cpu/x86/assembler_x86.hpp

As discussed elsewhere mfence() is no longer the advised mechanism, but 
instead a serializing instruction like cpuid must be used.

---

src/hotspot/cpu/x86/macroAssembler_x86.cpp

MacroAssembler::debug32 seems quite bizarre. First, pre-existing:

  412   JavaThread* thread = JavaThread::current();
  413   JavaThreadState saved_state = thread->thread_state();
  414   thread->set_thread_state(_thread_in_vm);
  415   if (ShowMessageBoxOnError) {
  416     JavaThread* thread = JavaThread::current();
  417     JavaThreadState saved_state = thread->thread_state();
  418     thread->set_thread_state(_thread_in_vm);

Lines 416-418 are redundant.

IIUC you need to serialize after leaving a safepoint but this code 
cannot enter a safepoint. The state is _thread_in_vm, which prevents a 
safepoint from being reached and it does not do anything to change state 
(the ttyLocker locks without a safepoint check and so will not 
transition to _thread_blocked). So your change seems unnecessary. 
Further if this code could hit a safepoint check then the place for the 
new code would be in that return path from the safepoint not in this code.

---

src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp

2562   __ lock();
2563   __ addl(Address(r15_thread, JavaThread::thread_state_offset()), 1 
/* _thread_in_native_trans */);

This changes seems incorrect. You need a storeload barrier (at least) 
after the update to thread_state but your lock() prefix would place 
barriers before I think. In any case I find the existing code clearer 
and more obviously correct.

---

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
src/hotspot/os_cpu/windows_x86/orderAccess_windows_x86.hpp
src/hotspot/share/runtime/orderAccess.cpp
src/hotspot/share/runtime/stubRoutines.cpp
src/hotspot/share/runtime/stubRoutines.hpp

I don't understand why we need a stub for this (or for fence() for that 
matter) ??

---

src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp

1090   __ lock();
1091   __ addl(Address(thread, JavaThread::thread_state_offset()), 1 
/*_thread_in_native_trans*/);

Same comment as above.

---

src/hotspot/os_cpu/aix_ppc/orderAccess_aix_ppc.hpp
src/hotspot/os_cpu/linux_ppc/orderAccess_linux_ppc.hpp

PPC has isync for instruction stream synchronization, so I'd expect to 
see that used in OrderAccess::instruction_pipeline()

---

src/hotspot/os_cpu/bsd_x86/orderAccess_bsd_x86.hpp
src/hotspot/os_cpu/linux_x86/orderAccess_linux_x86.hpp
src/hotspot/os_cpu/solaris_x86/orderAccess_solaris_x86.hpp

s/mfence/cpuid

---

src/hotspot/os_cpu/linux_aarch64/orderAccess_linux_aarch64.hpp
src/hotspot/os_cpu/linux_arm/orderAccess_linux_arm.hpp

Aarch64/ARM has ISB for instruction stream sync.

---

src/hotspot/os_cpu/linux_s390/orderAccess_linux_s390.hpp

Need input from s390 folk.

---

src/hotspot/os_cpu/linux_sparc/orderAccess_linux_sparc.hpp
src/hotspot/os_cpu/solaris_sparc/orderAccess_solaris_sparc.hpp

SPARC needs a FLUSH

"H.1.6 Self-Modifying Code
If a program includes self-modifying code, it must issue a FLUSH 
instruction for each modified doubleword of instructions (or a call to 
supervisor software having an equivalent effect)." [SPARCV9 Architecture 
manual]

---

src/hotspot/share/classfile/classFileParser.cpp

Change is unrelated to this bug.

---

src/hotspot/share/prims/jvmtiEnv.cpp

The changes seem unnecessary. As the code comment states we are going 
from one safepoint unsafe state to another. There is no way to execute 
any JIT'd code before returning to Java, and that return is where we 
should sync if needed.

---

src/hotspot/share/prims/jvmtiExport.cpp

The change is unnecessary as we are executing in the VMThread. Not only 
would it be the one doing the code changes at a safepoint, it never 
executes compiled Java code anyway.

---

src/hotspot/share/runtime/interfaceSupport.inline.hpp
src/hotspot/share/runtime/safepoint.cpp

My initial thought was that the OrderAccess::instruction_pipeline() 
should be buried in the safepoint blocking code, but in some cases we 
may not have actually blocked at the safepoint and so (theoretically at 
least) may need to sync with the i-stream in case an intervening 
safepoint changed some code. Can't help but worry that unconditionally 
doing this on native transitions is going to hurt JNI performance. It 
also looks like we will double up on the instruction_pipeline() in some 
cases.

---

src/hotspot/share/runtime/thread.cpp

1839   OrderAccess::instruction_pipeline();

Can't see why this is necessary here, we should sync when returning to 
_thread_in_Java.

2471   if (thread_state() != _thread_blocked) {
2472     OrderAccess::instruction_pipeline();
2473   }

Again not sure why this is necessary. The call to java_suspend_self in 
the not-blocked case comes from 
JavaThread::handle_special_runtime_exit_condition, which itself is 
called from the transition code, so I'd expect this to be done in the 
transition code.

      thread->set_thread_state(_thread_blocked);
      thread->java_suspend_self();
      thread->set_thread_state(state);
+     OrderAccess::instruction_pipeline();

Again seems unnecessary - especially given you have one in 
java_suspend_self() already! (Notwithstanding I think that one should be 
removed.)

Thanks,
David
-----

> After a JavaThread have been in a safepoint/(handshake) safe state it 
> can start
> executing updated code. E.g. an oop in the instruction stream can have been
> updated.
> 
> Most hardware's require a barrier or that the code cross modified is far 
> away to
> guarantee that the thread executing the updated instruction stream sees the
> modification.
> 
> What far away is and how far an update instruction stream is from a 
> safepoint
> safe state is not clear.
> 
> To be compliant with those architectures an instruction stream barrier 
> must be
> added when leaving the safepoint safe state.
> 
> There may be crashes today due to this missing barrier.
> A new CPU with deeper pipeline or changes to the VM which moves a 
> safepoint safe
> state closer to a nmethod can increase changes of a crash.
> 
> A few benchmarks will see a performance regression with the extra 
> serializing,
> such as Octane-Crypto -5% (x86).
> 
> With a much more elaborate fix, such as changing the local poll to have 
> more
> than two states (armed/disarmed), it would be possible to only emit such
> instruction stream barrier when the poll have been armed while the 
> thread was
> safepoint safe.
> 
> Passes t1-3 on our platforms, and test built what I can.
> 
> Thanks, Robbin