A question about memory order on x86_64
David Holmes
david.holmes at oracle.com
Mon Jun 21 23:28:20 UTC 2021
On 22/06/2021 8:02 am, David Holmes wrote:
> On 22/06/2021 6:49 am, Liu Xin wrote:
>> hi, David,
>>
>> Thanks. This is my blindspot. Let me go through your description.
>> There are 3 threads. M is the main thread which is doing
>> logConfiguration changes.
>> A is the async log thread, which is flushing. L is a user thread which
>> is emitting logs.
>>
>> M: flush1/sem_wait ---- ---- ---- ---- ---- ---- ---- flush1/sem_post
>> ---- flush2/sem_wait ---- flush2/sem.post <http://sem.post> ----
>> delete_output() ----
>> L: ---- ---- enqueue(log1) ---- ---- ---- ---- ---- ---- ---- ----
>> ---- ----
>> A: ---- ---- ---- ---- ---- dequeue(log1) ---- run/sem_wait ---- ----
>> ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
>> write_blocking(log1)
>>
>> This is the timeline you described, isn't it?
>
> Probably close enough.
>
>> A key missing information is that sem_wait() (POSIX semaphore) doesn't
>> respect FIFO order.
>> I thought Thread-A has invoked run/sem_wait() first. If
>> flush1/sem_post() is called, kernel must wake up thread A immediately.
>> It may NOT the case, right?
>
> Right there are no guarantees that when M calls signal/post to wakeup A,
> that A will actually get to run and consume the token and so return from
> sem_wait. If the implementation is based on futex then it is likely
> similar to mutex operation and the newly woken thread will contend with
> all other threads to try and claim the semaphore. In such cases if M
> calls sem_wait very shortly after sem_post then it will likely reclaim
> the semaphore.
>
>> It's possible that thread-A is descheduled and thread-M can keep
>> going. Under the hook, I see that glibc implements sem_wait using
>> syscall futex.
>>
>> In my test, I stupidly put two set_log_config() in a loop. Therefore,
>> M invokes two flush() in a row!
>> https://github.com/openjdk/jdk/pull/4408/files#diff-25062d200033077d44dacb5aae6b8141316b4bc10149c061cf047fe7db250d98R306
>> <https://github.com/openjdk/jdk/pull/4408/files#diff-25062d200033077d44dacb5aae6b8141316b4bc10149c061cf047fe7db250d98R306>
>>
>>
>> Let me try to simulate this and see if I have a fix.
>
> I'm beginning to think that the only safe thing to do here is to
> lock-out configuration changes while any write is in progress.
But we can't do that as we could block indefinitely while writing.
Backing up a step lets ignore async logging for a moment. How does
synchronous logging deal with this problem? What ensures that a log
request can't use an output that is being concurrently deleted by a
configuration change?
David
> David
> -----
>
>> Thanks,
>> -lx
>>
>>
>>
>>
>> On Sun, Jun 20, 2021 at 6:54 PM David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>> Hi Xin,
>>
>> I'm looking at your code again and apologies if we covered this
>> previously but I'm not seeing how things are supposed to work
>> exactly.
>> Looking at the write() function :
>>
>> void AsyncLogWriter::write() {
>> AsyncLogBuffer logs;
>> bool own_io = false;
>>
>> { // critical region
>> AsyncLogLocker locker;
>>
>> _buffer.pop_all(&logs);
>> // append meta-messages of dropped counters
>> AsyncLogMapIterator dropped_counters_iter(logs);
>> _stats.iterate(&dropped_counters_iter);
>> own_io = _io_sem.trywait();
>> }
>>
>> LinkedListIterator<AsyncLogMessage> it(logs.head());
>> if (!own_io) {
>> _io_sem.wait();
>> }
>>
>> while (!it.is_empty()) {
>> AsyncLogMessage* e = it.next();
>> char* msg = e->message();
>>
>> if (msg != nullptr) {
>> e->output()->write_blocking(e->decorations(), msg);
>>
>> Supposed that a flush() operation has called write() and it has
>> reached
>> the point where it is about to release the io_sem at the end and gets
>> descheduled. Other threads then perform logging operations that
>> include
>> a specific output N. The async log thread executes the code above,
>> and
>> stashes those log messages off in the local "logs" and then blocks on
>> io_sem_wait. Let the original flush release the io_sem, and a second
>> flush() commence due to the log configuration update that will delete
>> output N. That flush also completes and releases the io_sem. The
>> async
>> log thread now acquires io_sem and attempts to write to the output
>> that
>> was just deleted.
>>
>> David
>> -----
>>
>> On 21/06/2021 6:50 am, Liu, Xin wrote:
>> > Hi, Hotspot experts,
>> >
>> > I am debugging a PR which uses lock-free programming. David Holmes
>> > helped me submit this patch to the internal regression
>> infra(thanks a
>> > lot). We got intermittent failures at tier4~7. There are two
>> distinct
>> > crash sights, but they're similar. I guess the reason is asynclog
>> thread
>> > observed different memory state from other threads.
>> >
>> > https://github.com/openjdk/jdk/pull/4408#issuecomment-863788283
>> <https://github.com/openjdk/jdk/pull/4408#issuecomment-863788283>
>> >
>> > I have spent a lot efforts on reproducing this problem.
>> Eventually, I
>> > got a crash in docker and luckily coredump file and hs_err report
>> were
>> > all saved.
>> >
>> > In core dump file, I see this stacktraces in asynclog thread.
>> >
>> > (gdb) bt
>> > ...
>> > #24 0x00007f76e1ab0e90 in flockfile () from /lib64/libc.so.6
>> > #25 0x00007f76e3d0c253 in FileLocker (file=0xbabababababababa,
>> > this=<synthetic pointer>) at
>> > /jdk/src/hotspot/share/logging/logFileStreamOutput.cpp:125
>> > #26 LogFileStreamOutput::write (this=0x55ea4d504bd0,
>> decorations=...,
>> > msg=0x7f7624079b00 "ConcurrentLogsite 3 emits a log") at
>> > /jdk/src/hotspot/share/logging/logFileStreamOutput.cpp:125
>> > #27 0x00007f76e3d0b572 in LogFileOutput::write_blocking
>> > (this=0x55ea4d504bd0, decorations=..., msg=0x7f7624079b00
>> > "ConcurrentLogsite 3 emits a log") at
>> > /jdk/src/hotspot/share/logging/logFileOutput.cpp:308
>> > #28 0x00007f76e3d05bbd in write (this=0x55ea4d2e1af0) at
>> > /jdk/src/hotspot/share/logging/logAsyncWriter.hpp:106
>> > #29 AsyncLogWriter::run (this=0x55ea4d2e1af0) at
>> > /jdk/src/hotspot/share/logging/logAsyncWriter.cpp:163
>> > #30 0x00007f76e44696a0 in Thread::call_run
>> (this=0x55ea4d2e1af0) at
>> > /jdk/src/hotspot/share/runtime/thread.cpp:394
>> > #31 0x00007f76e3fdf9b4 in thread_native_entry
>> (thread=0x55ea4d2e1af0) at
>> > /jdk/src/hotspot/os/linux/os_linux.cpp:720
>> > #32 0x00007f76e162cea5 in start_thread () from
>> /lib64/libpthread.so.0
>> > #33 0x00007f76e1b439fd in clone () from /lib64/libc.so.6
>> >
>> > The corresponding code is here. _stream is 0xbabababababababa?
>> >
>> > int LogFileStreamOutput::write(const LogDecorations& decorations,
>> const
>> > char* msg) {
>> > const bool use_decorations = !_decorators.is_empty();
>> >
>> > int written = 0;
>> > FileLocker flocker(_stream);
>> > if (use_decorations)
>> >
>> > Here is the relevant disassembly code.
>> > ...prolog
>> > 0x7f76e3d0c223 <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+19> mov %rdi,%rbx
>> > 0x7f76e3d0c226 <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+22> sub $0x128,%rsp
>> > 0x7f76e3d0c22d <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+29> mov 0x28(%rdi),%r15
>> > 0x7f76e3d0c231 <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+33> mov 0x20(%rdi),%r12d
>> > 0x7f76e3d0c235 <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+37> mov %rdx,-0x148(%rbp)
>> > 0x7f76e3d0c23c <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+44> mov %r15,%rdi
>> > 0x7f76e3d0c23f <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+47> mov %fs:0x28,%rax
>> > 0x7f76e3d0c248 <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+56> mov %rax,-0x38(%rbp)
>> > 0x7f76e3d0c24c <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+60> xor %eax,%eax
>> > 0x7f76e3d0c24e <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+62> callq 0x7f76e3feb090 <os::flockfile(_IO_FILE*)>
>> > 0x7f76e3d0c253 <LogFileStreamOutput::write(LogDecorations const&,
>> char
>> > const*)+67> test %r12d,%r12d
>> >
>> >
>> > We can see that r15 = load 0x28(%rdi) and its value is bad.
>> however, in
>> > memory, Its value is correct. this->_stream is 0x55ea4d0f8d00.
>> >
>> > (gdb) p /x $r15
>> > $13 = 0xbabababababababa
>> >
>> > (gdb) p /x $rbx
>> > $27 = 0x55ea4d504bd0
>> >
>> > (gdb) x /1g $rbx + 0x28
>> > 0x55ea4d504bf8: 0x000055ea4d0f8d00
>> >
>> > In main thread, we can see that output->_stream is well defined.
>> >
>> > (gdb) p output
>> > $24 = (LogOutput *) 0x55ea4d504bd0
>> >
>> > (gdb) p ((LogFileOutput*)output)->_stream
>> > $23 = (FILE *) 0x55ea4d0f8d00
>> >
>> >
>> >
>>
>> -------------------------------------------------------------------------
>> >
>> > Can I say it's a memory ordering issue on x86_64? TBH, I can't
>> explain
>> > how that happens.
>> >
>> > Early moment in main thread, LogFileOutput::initialize() was
>> called and
>> > _stream was initialized. After then, the program at least has
>> gone through:
>> >
>> > 1. Atomic::add (lock prefix instruction, which is implied
>> store-load
>> > barrier?).
>> > 2. sem_wait/post (acquire-release semantics).
>> >
>> > Therefore, I believe the current thread must see the written
>> value of
>> > LogFileOutput::initialize(). ie. _stream should be (FILE *)
>> > 0x55ea4d0f8d00, not 0xbababa...
>> >
>> > Last but not least question. To further understand this problem
>> and
>> > verify patch, I think the key is to find a reproducible. Any
>> thoughts on
>> > how to reproduce this kinda of problem? So far, I only came
>> across this
>> > crash once on my side. I used a docker image of Oracle Linux 7.9
>> (glibc
>> > 2.17) with cpuset 0~2. Beside a core dump and a crash report, I
>> can't
>> > reproduce it anymore. I think I must overlook some important
>> factors
>> > here. Really appreciate some head-ups.
>> >
>> > I notice that Oracle is using 3-gen Zen architecture to run
>> tests. Is it
>> > anything different from conventional x86_64? In AMD's manual[1],
>> 7.4.2,
>> > Table-3, here is the description about lock-prefix instruction.
>> >
>> > k — All loads and stores associated with the I/O and locked
>> > instructions complete to memory (no bufferedstores) before a
>> load or
>> > store from a subsequent instruction is issued.
>> >
>> > It seems to me that it doesn't imply "bidirectional full memory
>> barrier"
>> > or storeload barrier. it just has acquire semantic. I read the
>> story
>> > behind this change and Aleksey's article[2]. I'm just not sure the
>> > following code is equivalent to "mfence" on EPYC machines because
>> AMD's
>> > manual is different from Intel's.
>> >
>> > inline void OrderAccess::fence() {
>> > // always use locked addl since mfence is sometimes expensive
>> > #ifdef AMD64
>> > __asm__ volatile ("lock; addl $0,0(%%rsp)" : : : "cc",
>> "memory");
>> > #else
>> > __asm__ volatile ("lock; addl $0,0(%%esp)" : : : "cc",
>> "memory");
>> > #endif
>> > compiler_barrier();
>> > }
>> >
>> >
>> > [1] https://www.amd.com/system/files/TechDocs/24593.pdf
>> <https://www.amd.com/system/files/TechDocs/24593.pdf>
>> > [2]
>> >
>>
>> https://shipilev.net/blog/2014/on-the-fence-with-dependencies/#_footnote_1
>>
>>
>> <https://shipilev.net/blog/2014/on-the-fence-with-dependencies/#_footnote_1>
>>
>> >
>> > thanks,
>> > --lx
>> >
>> >
>> >
>>
More information about the hotspot-runtime-dev
mailing list