RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2]

Axel Boldt-Christmas aboldtch at openjdk.org
Tue Jun 20 15:06:05 UTC 2023


On Tue, 20 Jun 2023 08:26:08 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

>> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). 
>> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not.
>> 
>> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe.
>> 
>> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in.
>
> Erik Österlund has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Typo in comment

This fix looks sound with respect to the AMD specification. 

Given the nature of the bug and how hard it would be to reproduce, I can only hope this is the cause. 
However having this fixed makes it easier to reason about what can happen.

-------------

Marked as reviewed by aboldtch (Committer).

PR Review: https://git.openjdk.org/jdk/pull/14543#pullrequestreview-1488382736


More information about the hotspot-dev mailing list