RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers

Thu Jul 4 08:01:18 UTC 2024

On Tue, 2 Jul 2024 15:43:08 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code.
> 
> We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher.
> 
> However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification.
> 
> There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully.
> 
> This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly.

The always fence changes looks good.

The effects w.r.t. `DeoptimizeNMethodBarriersALot` and our testing is less clear to me.

src/hotspot/share/gc/shared/barrierSetNMethod.cpp line 197:

> 195:   if (DeoptimizeNMethodBarriersALot && !nm->is_osr_method()) {
> 196:     static volatile uint32_t counter=0;
> 197:     if (Atomic::add(&counter, 1u) % 10 == 0) {

I have not good intuition about the frequency of this, and how this affects things. So have hard time commenting on this change.

An alternative would be to just fence when the `nmethod` is already disarmed and not call `bs_nm->nmethod_entry_barrier(nm);`. This is what effectively already happens in all implementations of `nmethod_entry_barrier` after [JDK-8331911](https://bugs.openjdk.org/browse/JDK-8331911) / #19285. (Maintaining the current behaviour with respect to `DeoptimizeNMethodBarriersALot`). 

But as long as this new magic constant seems to have similar testing coverage when running our tests that use `DeoptimizeNMethodBarriersALot` this seems like a sensible solution as well.

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19990#pullrequestreview-2158224060
PR Review Comment: https://git.openjdk.org/jdk/pull/19990#discussion_r1665284317