MOVABSQ yields wrong result in the destination register on x86_64?

Wed May 3 19:58:21 UTC 2023

Hi, Stefan and Volker, 

Thanks for information.

Yes, I spent a lot of time looking into 'implicit null check', but it turns out it's not the case.  Your patch indicates that it's a kernel-sent signal. 
I think we still need to rootcause why this happen in the first place. 

I think it's segment fault with "si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000" is an important lead. 
If we execute a store movb $0, (%r11,%r10) with r11 = 0x00047f815831a000, it exceeds the maximal address of userspace.

I haven't seen the exact definition of TASK_SIZE, but I believe Stefan refers to the same concept.  On Linux,  a user-mode process can only use up 48bits as its address space. 
R11 has its 50th high-order bit set so it's very likely that it triggers the segment fault.

I see that MOVABSQ updates R11 right before. we can't explain why it gets R11 wrong.  

If we know more about the reason, maybe can we resolve this issue by updating microcode. 
I don't think it's about icache. It can't explain why only and always set the 50th high-order of the dst register.  Must be done by some logics. 

Thanks, 
--lx

On 5/3/23, 6:09 AM, "Stefan Karlsson" <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On 2023-05-03 14:48, Volker Simonis wrote:
> On Wed, May 3, 2023 at 6:41 AM Stefan Karlsson
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>> On 2023-05-03 00:24, Liu, Xin wrote:
>>> Hi, 
>>>
>>> We recently observe some random hotspot crashes when they use serialGC on x86_64 linux. So far, only we get crash reports from jdk-8/11 but I believe the codegen rules are same in the newer versions.
>>>
>>> A common pattern is as follows:
>>> 1. got SIGSEGV and si_code is SI_KERNEL and si_addr is 0.
>>> "siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000"
>>>
>>> 2. The last event seems an implicit null exception but target_pc is 0. pc is where causes SIGSEGV. eg
>>> "Event: 44.827 Thread 0x00007f815400b800 Implicit null exception at 0x00007f8150e68daf to 0x0000000000000000"
>> Just a note about the SI_KERNEL / si_addr == 0 and implicit null
>> exception. See:
>> https://bugs.openjdk.org/browse/JDK-8294003 <https://bugs.openjdk.org/browse/JDK-8294003>
>>
> This happened with an "Intel(R) Xeon(R) Processor @ 2.90GHz" on Amazon
> Linux release 2 (Linux 4.14.255, glibc 2.26) so I doubt that it is
> related to the original "unstable signal handling" issue.

That was not what I tried to imply by linking to the bug above. The bug
above states that if you tried to dereference a pointer with high-order
bits set beyond the TASK_SIZE limit you will get SI_KERNEL and si_addr
== 0, even though the address was *not* 0. When this happened the code
misinterpreted the state for being an implicit null exception and we
ended up crashing further down in the code. Similar to what was
described above in bulllets (1) and (2). The fix for that issue has been
fixed for JDK 20, but not older release.

Note, that I'm only describing why you see the SI_KERNEL, si_addr ==0,
and implicit null exception, not the real bug that is described later in
the mail.

StefanK

>
> My assumption is that the bad value we see in the register is exactly
> what was loaded from the instruction stream before (i.e. I can't
> believe that MOVABSQ is faulty), but at the time the hs_err file is
> dumped, that value has already changed. However, I don't have an
> explanation for how this could happen? The compiled method where this
> happens is pretty old (i.e. it has compilation ID ~500 whereas the
> latest compilation events in the hs_err file have compilation IDs >
> 1000) so it is unlikely to be an icash flushing issue. I also haven't
> found any parts near the crashing instructions which would be subject
> to patching.
>
>> StefanK
>>
>>> 3. last instruction before the faulty pc is MOVABSQ #byte_map_base, dst register. This instruction moves a 64bit immediate to a register.
>>>
>>> Eg.
>>>
>>> Card table byte_map: [0x00007f81589b3000,0x00007f8158b1b000] byte_map_base: 0x00007f815831a000
>>>
>>> Instructions: (pc=0x00007f8150e68daf)
>>> 0x00007f8150e68d8f: 03 00 00 49 8b c2 4c 8b 5c 24 18 45 89 53 14 4d
>>> 0x00007f8150e68d9f: 8b d3 49 c1 ea 09 49 bb 00 a0 31 58 81 7f 00 00
>>> 0x00007f8150e68daf: 43 c6 04 13 00 48 83 c4 50 5d 85 05 41 92 7c 0a
>>>
>>> We can translate them to x86_64 instruction sequence (I use llvm-mc to disassemble them)
>>> .text
>>> addl (%rax), %eax # encoding: [0x03,0x00]
>>> addb %cl, -117(%rcx) # encoding: [0x00,0x49,0x8b]
>>> retq $-29876 # encoding: [0xc2,0x4c,0x8b]
>>> # imm = 0x8B4C
>>> popq %rsp # encoding: [0x5c]
>>> andb $24, %al # encoding: [0x24,0x18]
>>> movl %r10d, 20(%r11) # encoding: [0x45,0x89,0x53,0x14]
>>> movq %r11, %r10 # encoding: [0x4d,0x8b,0xd3]
>>> shrq $9, %r10 # encoding: [0x49,0xc1,0xea,0x09]
>>> movabsq $140193507155968, %r11 # encoding: [0x49,0xbb,0x00,0xa0,0x31,0x58,0x81,0x7f,0x00,0x00]
>>> # imm = 0x7F815831A000
>>> PC>movb $0, (%r11,%r10) # encoding: [0x43,0xc6,0x04,0x13,0x00]
>>> addq $80, %rsp # encoding: [0x48,0x83,0xc4,0x50]
>>> popq %rbp # encoding: [0x5d]
>>> testl %eax, 175936065(%rip) # encoding: [0x85,0x05,0x41,0x92,0x7c,0x0a]
>>>
>>>
>>> MOVABSQ moves 0x7f815831a000 to R11 and pc is about to store dirty card to the card table.
>>> Because hotspot crash report also contains the registers in ucontext, we found that there's 1 bit flip in the dst register.
>>>
>>> In this case, R11 = 0x00047f815831a000. Not 0x00007f815831a000! One bit flip!
>>>
>>> In all reports we collected, dst register may vary, but it's always the 50th bit flip after MOVABSQ.
>>> It's also weird that the address of faulty instruction is at 0xf. For instance, it's 0x00007f8150e68daf.
>>>
>>> Have you seen this problem before?
>>> For x86_64, do we need to pay attention to the alignment for text? I read x86_64 manual, I didn't find any caveat on alignment.
>>>
>>> In this case, gc post barrier is emitted by C2. C2 backend selects MOVABSQ using load_immL rule.
>>>
>>> enc_class load_immL(rRegL dst, immL src)
>>> %{
>>> int dstenc = $dst$$reg;
>>> if (dstenc < 8) {
>>> emit_opcode(cbuf, Assembler::REX_W);
>>> } else {
>>> emit_opcode(cbuf, Assembler::REX_WB);
>>> dstenc -= 8;
>>> }
>>> emit_opcode(cbuf, 0xB8 | dstenc);
>>> emit_d64(cbuf, $src$$constant);
>>> %}
>>>
>>> Thanks,
>>> --lx
>>>
>>>
>>>
>>>