RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding

Mon May 5 12:18:45 UTC 2025

On Wed, 30 Apr 2025 02:29:34 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664
>> 
>> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2]
>> 
>> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception.
>> 
>> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction.
>> 
>> Please review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>> 
>> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B
>> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873
>> 
>> 
>> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs.
>
> What I meant is that we should map a relocation to BOTH the instruction start and the patch site. APX has not even released yet so I think it is more efficient to make a better fix than to make a quicker one.

I think @merykitty solution with two different relocations based on wether we support  APX or not.  And only emit the after and nop when `VM_Version::supports_apx_f()` is true. 

On the other hand maybe we can solve this with a minimal change by simply looking for the REX2 prefix when we patch the code. Something along the line of:

diff --git a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp
index 9cdf0b229c0..4a956b450bd 100644
--- a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp
@@ -1328,7 +1328,13 @@ void ZBarrierSetAssembler::patch_barrier_relocation(address addr, int format) {
   const uint16_t value = patch_barrier_relocation_value(format);
   uint8_t* const patch_addr = (uint8_t*)addr + offset;
   if (format == ZBarrierRelocationFormatLoadGoodBeforeShl) {
-    *patch_addr = (uint8_t)value;
+    if (VM_Version::supports_apx_f()) {
+      NativeInstruction* instruction = nativeInstruction_at(addr);
+      uint8_t* const rex2_patch_addr = patch_addr + (instruction->has_rex2_prefix() ? 1 : 0);
+      *rex2_patch_addr = (uint8_t)value;
+    } else {
+      *patch_addr = (uint8_t)value;
+    }
   } else {
     *(uint16_t*)patch_addr = value;
   }


As for the solution to have the relocation point at the entry. While they were not designed to be used this way, It looks like it works. (At least from a barrier patching point of view, as we only want to iterate over all relocations, never map a PC to an relocation). But changing invariants are scary. And is probably better to evaluate as a part of the [JDK-8355341](https://bugs.openjdk.org/browse/JDK-8355341) RFE.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2850807205