MOVABSQ yields wrong result in the destination register on x86_64?

Volker Simonis volker.simonis at gmail.com
Wed May 3 12:48:15 UTC 2023


On Wed, May 3, 2023 at 6:41 AM Stefan Karlsson
<stefan.karlsson at oracle.com> wrote:
>
> On 2023-05-03 00:24, Liu, Xin wrote:
> > Hi, 
> >
> > We recently observe some random hotspot crashes when they use serialGC on x86_64 linux. So far, only we get crash reports from jdk-8/11 but I believe the codegen rules are same in the newer versions.
> >
> > A common pattern is as follows:
> >   1.  got SIGSEGV and si_code is SI_KERNEL and si_addr is 0.
> >      "siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000"
> >
> >   2.  The last event seems an implicit null exception but target_pc is 0. pc is where causes SIGSEGV. eg
> >      "Event: 44.827 Thread 0x00007f815400b800 Implicit null exception at 0x00007f8150e68daf to 0x0000000000000000"
>
> Just a note about the SI_KERNEL / si_addr == 0 and implicit null
> exception. See:
> https://bugs.openjdk.org/browse/JDK-8294003
>

This happened with an "Intel(R) Xeon(R) Processor @ 2.90GHz" on Amazon
Linux release 2 (Linux 4.14.255, glibc 2.26) so I doubt that it is
related to the original "unstable signal handling" issue.

My assumption is that the bad value we see in the register is exactly
what was loaded from the instruction stream before (i.e. I can't
believe that MOVABSQ is faulty), but at the time the hs_err file is
dumped, that value has already changed. However, I don't have an
explanation for how this could happen? The compiled method where this
happens is pretty old (i.e. it has compilation ID ~500 whereas the
latest compilation events in the hs_err file have compilation IDs >
1000) so it is unlikely to be an icash flushing issue. I also haven't
found any parts near the crashing instructions which would be subject
to patching.

> StefanK
>
> >
> >   3.  last instruction before the faulty pc is MOVABSQ #byte_map_base, dst register.  This instruction moves a 64bit immediate to a register.
> >
> > Eg.
> >
> > Card table byte_map: [0x00007f81589b3000,0x00007f8158b1b000] byte_map_base: 0x00007f815831a000
> >
> > Instructions: (pc=0x00007f8150e68daf)
> > 0x00007f8150e68d8f: 03 00 00 49 8b c2 4c 8b 5c 24 18 45 89 53 14 4d
> > 0x00007f8150e68d9f: 8b d3 49 c1 ea 09 49 bb 00 a0 31 58 81 7f 00 00
> > 0x00007f8150e68daf: 43 c6 04 13 00 48 83 c4 50 5d 85 05 41 92 7c 0a
> >
> > We can translate them to x86_64 instruction sequence (I use llvm-mc to disassemble them)
> >          .text
> >          addl    (%rax), %eax                    # encoding: [0x03,0x00]
> >          addb    %cl, -117(%rcx)                 # encoding: [0x00,0x49,0x8b]
> >          retq    $-29876                         # encoding: [0xc2,0x4c,0x8b]
> >                                          # imm = 0x8B4C
> >          popq    %rsp                            # encoding: [0x5c]
> >          andb    $24, %al                        # encoding: [0x24,0x18]
> >          movl    %r10d, 20(%r11)                 # encoding: [0x45,0x89,0x53,0x14]
> >          movq    %r11, %r10                      # encoding: [0x4d,0x8b,0xd3]
> >          shrq    $9, %r10                        # encoding: [0x49,0xc1,0xea,0x09]
> >          movabsq $140193507155968, %r11          # encoding: [0x49,0xbb,0x00,0xa0,0x31,0x58,0x81,0x7f,0x00,0x00]
> >                                          # imm = 0x7F815831A000
> >   PC>movb    $0, (%r11,%r10)                 # encoding: [0x43,0xc6,0x04,0x13,0x00]
> >          addq    $80, %rsp                       # encoding: [0x48,0x83,0xc4,0x50]
> >          popq    %rbp                            # encoding: [0x5d]
> >          testl   %eax, 175936065(%rip)           # encoding: [0x85,0x05,0x41,0x92,0x7c,0x0a]
> >
> >
> > MOVABSQ moves 0x7f815831a000 to R11 and pc is about to store dirty card to the card table.
> > Because hotspot crash report also contains the registers in ucontext, we found that there's 1 bit flip in the dst register.
> >
> > In this case, R11 = 0x00047f815831a000.  Not 0x00007f815831a000! One bit flip!
> >
> > In all reports we collected, dst register may vary, but it's always the 50th bit flip after MOVABSQ.
> > It's also weird that the address of faulty instruction is at 0xf. For instance, it's 0x00007f8150e68daf.
> >
> > Have you seen this problem before?
> > For x86_64, do we need to pay attention to the alignment for text?  I read x86_64 manual, I didn't find any caveat on alignment.
> >
> > In this case,  gc post barrier is emitted by C2.  C2 backend selects MOVABSQ using load_immL rule.
> >
> > enc_class load_immL(rRegL dst, immL src)
> >    %{
> >      int dstenc = $dst$$reg;
> >      if (dstenc < 8) {
> >        emit_opcode(cbuf, Assembler::REX_W);
> >      } else {
> >        emit_opcode(cbuf, Assembler::REX_WB);
> >        dstenc -= 8;
> >      }
> >      emit_opcode(cbuf, 0xB8 | dstenc);
> >      emit_d64(cbuf, $src$$constant);
> > %}
> >
> > Thanks,
> > --lx
> >
> >
> >
> >
>


More information about the hotspot-dev mailing list