A question about memory order on x86_64
Liu, Xin
xxinliu at amazon.com
Sun Jun 20 20:50:03 UTC 2021
Hi, Hotspot experts,
I am debugging a PR which uses lock-free programming. David Holmes
helped me submit this patch to the internal regression infra(thanks a
lot). We got intermittent failures at tier4~7. There are two distinct
crash sights, but they're similar. I guess the reason is asynclog thread
observed different memory state from other threads.
https://github.com/openjdk/jdk/pull/4408#issuecomment-863788283
I have spent a lot efforts on reproducing this problem. Eventually, I
got a crash in docker and luckily coredump file and hs_err report were
all saved.
In core dump file, I see this stacktraces in asynclog thread.
(gdb) bt
...
#24 0x00007f76e1ab0e90 in flockfile () from /lib64/libc.so.6
#25 0x00007f76e3d0c253 in FileLocker (file=0xbabababababababa,
this=<synthetic pointer>) at
/jdk/src/hotspot/share/logging/logFileStreamOutput.cpp:125
#26 LogFileStreamOutput::write (this=0x55ea4d504bd0, decorations=...,
msg=0x7f7624079b00 "ConcurrentLogsite 3 emits a log") at
/jdk/src/hotspot/share/logging/logFileStreamOutput.cpp:125
#27 0x00007f76e3d0b572 in LogFileOutput::write_blocking
(this=0x55ea4d504bd0, decorations=..., msg=0x7f7624079b00
"ConcurrentLogsite 3 emits a log") at
/jdk/src/hotspot/share/logging/logFileOutput.cpp:308
#28 0x00007f76e3d05bbd in write (this=0x55ea4d2e1af0) at
/jdk/src/hotspot/share/logging/logAsyncWriter.hpp:106
#29 AsyncLogWriter::run (this=0x55ea4d2e1af0) at
/jdk/src/hotspot/share/logging/logAsyncWriter.cpp:163
#30 0x00007f76e44696a0 in Thread::call_run (this=0x55ea4d2e1af0) at
/jdk/src/hotspot/share/runtime/thread.cpp:394
#31 0x00007f76e3fdf9b4 in thread_native_entry (thread=0x55ea4d2e1af0) at
/jdk/src/hotspot/os/linux/os_linux.cpp:720
#32 0x00007f76e162cea5 in start_thread () from /lib64/libpthread.so.0
#33 0x00007f76e1b439fd in clone () from /lib64/libc.so.6
The corresponding code is here. _stream is 0xbabababababababa?
int LogFileStreamOutput::write(const LogDecorations& decorations, const
char* msg) {
const bool use_decorations = !_decorators.is_empty();
int written = 0;
FileLocker flocker(_stream);
if (use_decorations)
Here is the relevant disassembly code.
...prolog
0x7f76e3d0c223 <LogFileStreamOutput::write(LogDecorations const&, char
const*)+19> mov %rdi,%rbx
0x7f76e3d0c226 <LogFileStreamOutput::write(LogDecorations const&, char
const*)+22> sub $0x128,%rsp
0x7f76e3d0c22d <LogFileStreamOutput::write(LogDecorations const&, char
const*)+29> mov 0x28(%rdi),%r15
0x7f76e3d0c231 <LogFileStreamOutput::write(LogDecorations const&, char
const*)+33> mov 0x20(%rdi),%r12d
0x7f76e3d0c235 <LogFileStreamOutput::write(LogDecorations const&, char
const*)+37> mov %rdx,-0x148(%rbp)
0x7f76e3d0c23c <LogFileStreamOutput::write(LogDecorations const&, char
const*)+44> mov %r15,%rdi
0x7f76e3d0c23f <LogFileStreamOutput::write(LogDecorations const&, char
const*)+47> mov %fs:0x28,%rax
0x7f76e3d0c248 <LogFileStreamOutput::write(LogDecorations const&, char
const*)+56> mov %rax,-0x38(%rbp)
0x7f76e3d0c24c <LogFileStreamOutput::write(LogDecorations const&, char
const*)+60> xor %eax,%eax
0x7f76e3d0c24e <LogFileStreamOutput::write(LogDecorations const&, char
const*)+62> callq 0x7f76e3feb090 <os::flockfile(_IO_FILE*)>
0x7f76e3d0c253 <LogFileStreamOutput::write(LogDecorations const&, char
const*)+67> test %r12d,%r12d
We can see that r15 = load 0x28(%rdi) and its value is bad. however, in
memory, Its value is correct. this->_stream is 0x55ea4d0f8d00.
(gdb) p /x $r15
$13 = 0xbabababababababa
(gdb) p /x $rbx
$27 = 0x55ea4d504bd0
(gdb) x /1g $rbx + 0x28
0x55ea4d504bf8: 0x000055ea4d0f8d00
In main thread, we can see that output->_stream is well defined.
(gdb) p output
$24 = (LogOutput *) 0x55ea4d504bd0
(gdb) p ((LogFileOutput*)output)->_stream
$23 = (FILE *) 0x55ea4d0f8d00
-------------------------------------------------------------------------
Can I say it's a memory ordering issue on x86_64? TBH, I can't explain
how that happens.
Early moment in main thread, LogFileOutput::initialize() was called and
_stream was initialized. After then, the program at least has gone through:
1. Atomic::add (lock prefix instruction, which is implied store-load
barrier?).
2. sem_wait/post (acquire-release semantics).
Therefore, I believe the current thread must see the written value of
LogFileOutput::initialize(). ie. _stream should be (FILE *)
0x55ea4d0f8d00, not 0xbababa...
Last but not least question. To further understand this problem and
verify patch, I think the key is to find a reproducible. Any thoughts on
how to reproduce this kinda of problem? So far, I only came across this
crash once on my side. I used a docker image of Oracle Linux 7.9 (glibc
2.17) with cpuset 0~2. Beside a core dump and a crash report, I can't
reproduce it anymore. I think I must overlook some important factors
here. Really appreciate some head-ups.
I notice that Oracle is using 3-gen Zen architecture to run tests. Is it
anything different from conventional x86_64? In AMD's manual[1], 7.4.2,
Table-3, here is the description about lock-prefix instruction.
k — All loads and stores associated with the I/O and locked
instructions complete to memory (no bufferedstores) before a load or
store from a subsequent instruction is issued.
It seems to me that it doesn't imply "bidirectional full memory barrier"
or storeload barrier. it just has acquire semantic. I read the story
behind this change and Aleksey's article[2]. I'm just not sure the
following code is equivalent to "mfence" on EPYC machines because AMD's
manual is different from Intel's.
inline void OrderAccess::fence() {
// always use locked addl since mfence is sometimes expensive
#ifdef AMD64
__asm__ volatile ("lock; addl $0,0(%%rsp)" : : : "cc", "memory");
#else
__asm__ volatile ("lock; addl $0,0(%%esp)" : : : "cc", "memory");
#endif
compiler_barrier();
}
[1] https://www.amd.com/system/files/TechDocs/24593.pdf
[2]
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/#_footnote_1
thanks,
--lx
More information about the hotspot-runtime-dev
mailing list