MOVABSQ yields wrong result in the destination register on x86_64?

Tue May 2 22:24:33 UTC 2023

Hi, 

We recently observe some random hotspot crashes when they use serialGC on x86_64 linux. So far, only we get crash reports from jdk-8/11 but I believe the codegen rules are same in the newer versions. 

A common pattern is as follows: 
 1.  got SIGSEGV and si_code is SI_KERNEL and si_addr is 0.
    "siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000"

 2.  The last event seems an implicit null exception but target_pc is 0. pc is where causes SIGSEGV. eg
    "Event: 44.827 Thread 0x00007f815400b800 Implicit null exception at 0x00007f8150e68daf to 0x0000000000000000"

 3.  last instruction before the faulty pc is MOVABSQ #byte_map_base, dst register.  This instruction moves a 64bit immediate to a register.  

Eg. 

Card table byte_map: [0x00007f81589b3000,0x00007f8158b1b000] byte_map_base: 0x00007f815831a000

Instructions: (pc=0x00007f8150e68daf)
0x00007f8150e68d8f: 03 00 00 49 8b c2 4c 8b 5c 24 18 45 89 53 14 4d
0x00007f8150e68d9f: 8b d3 49 c1 ea 09 49 bb 00 a0 31 58 81 7f 00 00
0x00007f8150e68daf: 43 c6 04 13 00 48 83 c4 50 5d 85 05 41 92 7c 0a

We can translate them to x86_64 instruction sequence (I use llvm-mc to disassemble them)
        .text
        addl    (%rax), %eax                    # encoding: [0x03,0x00]
        addb    %cl, -117(%rcx)                 # encoding: [0x00,0x49,0x8b]
        retq    $-29876                         # encoding: [0xc2,0x4c,0x8b]
                                        # imm = 0x8B4C
        popq    %rsp                            # encoding: [0x5c]
        andb    $24, %al                        # encoding: [0x24,0x18]
        movl    %r10d, 20(%r11)                 # encoding: [0x45,0x89,0x53,0x14]
        movq    %r11, %r10                      # encoding: [0x4d,0x8b,0xd3]
        shrq    $9, %r10                        # encoding: [0x49,0xc1,0xea,0x09]
        movabsq $140193507155968, %r11          # encoding: [0x49,0xbb,0x00,0xa0,0x31,0x58,0x81,0x7f,0x00,0x00]
                                        # imm = 0x7F815831A000
 PC>movb    $0, (%r11,%r10)                 # encoding: [0x43,0xc6,0x04,0x13,0x00]
        addq    $80, %rsp                       # encoding: [0x48,0x83,0xc4,0x50]
        popq    %rbp                            # encoding: [0x5d]
        testl   %eax, 175936065(%rip)           # encoding: [0x85,0x05,0x41,0x92,0x7c,0x0a]

MOVABSQ moves 0x7f815831a000 to R11 and pc is about to store dirty card to the card table.
Because hotspot crash report also contains the registers in ucontext, we found that there's 1 bit flip in the dst register.

In this case, R11 = 0x00047f815831a000.  Not 0x00007f815831a000! One bit flip!

In all reports we collected, dst register may vary, but it's always the 50th bit flip after MOVABSQ.
It's also weird that the address of faulty instruction is at 0xf. For instance, it's 0x00007f8150e68daf. 

Have you seen this problem before? 
For x86_64, do we need to pay attention to the alignment for text?  I read x86_64 manual, I didn't find any caveat on alignment. 

In this case,  gc post barrier is emitted by C2.  C2 backend selects MOVABSQ using load_immL rule.

enc_class load_immL(rRegL dst, immL src)
  %{
    int dstenc = $dst$$reg;
    if (dstenc < 8) {
      emit_opcode(cbuf, Assembler::REX_W);
    } else {
      emit_opcode(cbuf, Assembler::REX_WB);
      dstenc -= 8;
    }
    emit_opcode(cbuf, 0xB8 | dstenc);
    emit_d64(cbuf, $src$$constant);
%}

Thanks,
--lx