RFR: 8229517: Support for optional asynchronous/buffered logging [v20]

Mon May 24 21:57:59 UTC 2021

On Mon, 24 May 2021 04:12:54 GMT, Xin Liu <xliu at openjdk.org> wrote:

>> This patch provides a buffer to store asynchrounous messages and flush them to
>> underlying files periodically.
>
> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   flush() waits until all pending logging IO operations are done.
>   
>   This patch support all gtests in async mode.
>   make test TEST="gtest:all" TEST_OPTS='VM_OPTIONS=-Xlog:async'

I have a workaround.  this is a biased solution. Who is taking the buffer lock has the priority to obtain the IO channel as well. 

PS: I have to use trywait here to guarantee non-blocking. wait() will make one thread sleep with the buffer lock.  As a result, logsites can't enqueue messages.

void AsyncLogWriter::write() {
...
  AsyncLogBuffer logs;
  bool own_io = false;

  { // critical region
    AsyncLogLocker lock;

    _buffer.pop_all(&logs);
    // append meta-messages of dropped counters
    AsyncLogMapIterator dropped_counters_iter(logs);
    _stats.iterate(&dropped_counters_iter);
    own_io = _io_sem.trywait();
  }

  LinkedListIterator<AsyncLogMessage> it(logs.head());
  if (!own_io) {
    _io_sem.wait();
  }
...

If we only deal with two threads situation, it can prevent the "interleaving" issue.
Whoever fail get the second chance at `_io_sem.wait()`.

> Do the log messages already contain the appropriate timestamps from the original log call, or is that added when the log message is actually output to the destination?

Current implementation can guarantee both of them. 

I have to say that we may over-thinking here. There are 3 situations where we used flush(). they all belong to "two-thread situation". one log writer thread. the other thread invokes `flush()`.

1. termination.
2. abortion
3. gtest. 

It seems that only os::abort() is tricky.  I understand that it will execute with other parallel threads. Actually, I used to run into problem for this one. TEST="gtest:os.page_size_for_region_with_zero_min_pages_vm_assert_test", which trigger segfault on purpose. Now it works fine. 

What do you think about my workaround? 
If you still feel os::abort() is unsafe, how about we just remove flush() from os::abort(). 
I have to say that write() is pretty frequent, and buffer is always empty().

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135