RFR: 8334890: Missing unconditional cross modifying fence in nmethod entry barriers
John R Rose
jrose at openjdk.org
Thu Jul 4 21:31:21 UTC 2024
On Tue, 2 Jul 2024 15:43:08 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:
> On x86_64, our nmethod entry barriers use a mix of asynchronous and synchronous code modification. There is a cmp instruction with an immediate. When the immediate value is "incorrect", the nmethod is armed, and when it's "correct", it's disarmed. When we load the immediate with the instruction fetcher, we use asynchronous cross modifying code, and when we load the immediate as data, we use synchronous cross modifying code.
>
> We use asynchronous code modification in the fast path of nmethod entry barriers. If the nmethod is concurrently being disarmed while the nmethod entry barrier is executed, then we are guaranteed that if the updated "correct" immediate is observed by the instruction fetcher, then any code modification to the nmethod prior to disarming it on another thread, is guaranteed to also be observed by the instruction fetcher.
>
> However, in the slow path, when the immediate was observed to have the "incorrect" value by the instruction fetcher, we call a C++ function, BarrierSetNMethod::nmethod_stub_entry_barrier. In this function we check if the nmethod is disarmed or armed, by loading the guard value (from the immediate), as data. If we observe the updated value, indicating that the nmethod has become disarmed, we want to enter the nmethod. However, since we used data to signal that the instruction cross modification has happened, it is not safe to execute the concurrently modified instructions, without enforcing a cross modifying code fence. This is synchronous code modification.
>
> There is some questionable optimization that in the stub slow path entry (which we just got to because the nmethod was observed to be armed by the instruction fetcher). It checks "just one more time" if the nmethod concurrently got disarmed, and then exits without cross modification fence. This is an opportunistic optimization that is very unlikely to be useful, since we got into the slow path because it a couple of instructions ago was armed. This opportunistic optimization breaks the synchronous code modification contract, which is that you have to issue an instruction cross modification fence after reading the data that signalled that cross modification has completed successfully.
>
> This patch removes these kinds of opportunistic optimizations from the nmethod entry barrier code, in order to make it more robust and follow the synchronous cross modification dance correctly.
I see you integrated; good job, and thank you to the Reviewers. The narrative at the top of this PR is excellent for motivating and explaining the removal of the extra check. Some of the other diffs are more mysterious, as Axel noted.
There are no API docs in barrierSetNMethod.[ch]pp so I would need to trace through all the code paths to properly educate myself about the effect of this change.
For example, BarrierSetNMethod::nmethod_entry_barrier is a very interesting function, along with its OSR brother, but there are no comments directly visible here that give a clue as to when it is called, or why it must be called. I think that level of non-documentation is often a maintenance problem. I see this file relates to the larger API in barrierSet.hpp but that file has sparse comments also.
Ideally, I’d hope to read The Narrative of the Barrier Set at the top of barrierSet.hpp, and maybe have a brief pointer to The Narrative from less-commented related files like barrierSetNMethod.cpp.
Also, if I did this change, and was feeling chatty and cautious, I’d leave behind an informative comment to the effect that “you might want to double-check the barrier state here, but don’t, because races”. It’s nice not to leave a seam from past history, but sometimes the absence of a warning leads people to repeat history.
None of the above critiques would have stopped me from approving the change as another Reviewer, but the lack of documentation would have made me hesitate to review quickly.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19990#issuecomment-2209578133
More information about the hotspot-dev
mailing list