RFR: 8229517: Support for optional asynchronous/buffered logging [v14]

Thu May 13 06:59:00 UTC 2021

On Wed, 12 May 2021 19:20:15 GMT, Xin Liu <xliu at openjdk.org> wrote:

>> This patch provides a buffer to store asynchrounous messages and flush them to
>> underlying files periodically.
>
> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix build on Windows.
>   
>   It seems that headers on Windows define WAIT_TIMEOUT somewhere. rename it.

Some more remarks.

One thing occurring to me was that a global buffer for all outputs is maybe not the best choice. One buffer per LogFileOutput seems like a cleaner, more natural fit:
- you would not need to save LogFileOutput reference in each message
- you can keep the drop counter as part of the output buffer, which seems more natural. No need for this kv map thing.
- less lock contention between log sites going to separate output buffers.

wrt to dimensions, the AsyncBufferSize and its derived number-of-entries would be a global limit for the sum of all enqueued messages.

Cheers, Thomas

> > About synchronization: I don't understand the reasoning behind having a periodic flush. Would an implementation which just flushes if output is available not be less complex and actually better? Why make the flusher thread wake up twice a second if there is nothing to do? OTOH why wait 500ms if there is actually output to be printed?
> 
> From my observation, the character of logging events is high-frequent but low cost. Using HotSpot Monitor(the combination of pthead_mutex_t and pthread_cond_t) for an individual logging event is expensive. The cost of waking up AsyncLog thread and lock/unlock data structure would easily shadow IO jobs(fprintf), which usually are cached in libc.
> 
> My strategy is to do IO jobs in batch and amortize the cost of synchronizations. In my design, m.wait() in the following code snippet wakes up mainly due to the load of buffer is over 3/4.
> 
> ```c++
>   while (_state == ThreadState::Running) {
>     {
>       MonitorLocker m(&_lock, Mutex::_no_safepoint_check_flag);
>       m.wait(500 /* ms, timeout*/);
>     }
>     flush();
>   }
> ```
> 
> 500ms here, which was the global option `LogAsyncInterval` , serves as a **stall protector**. If Hotspot stalls for whatever reasons, AsyncLog thread still can flush messages after timeout. I think it's useful for develops who may rely on the log files to figure out what happened inside of the hanging hotspot. Without timeout parameter, AsyncLog Thread may leave messages in the buffer and never get a change to flush them if all threads stall.

Okay, I understand.

src/hotspot/share/logging/logAsyncFlusher.cpp line 39:

> 37: }
> 38: 
> 39: void LogAsyncFlusher::enqueue_impl(const AsyncLogMessage& msg) {

Could you rename this to enqueue_locked? We use this name at other places for functions which factor out locking to the caller.

src/hotspot/share/logging/logAsyncFlusher.cpp line 40:

> 38: 
> 39: void LogAsyncFlusher::enqueue_impl(const AsyncLogMessage& msg) {
> 40:   assert_lock_strong(&_lock);

Can we enqueue (so, log) from within the async logger itself? If not, add an assert here that this should not be called from the logger thread?

src/hotspot/share/logging/logAsyncFlusher.cpp line 42:

> 40:   assert_lock_strong(&_lock);
> 41: 
> 42:   if (_buffer.size() >= _buffer_max_size)  {

As mentioned in the CSR, I prefer dropping the newest message, not the oldest one. So there would be no reason to pop anything from the buffer.

src/hotspot/share/logging/logAsyncFlusher.cpp line 58:

> 56:         // or throttled by XOFF, so only dump the dropping message in Verbose mode.
> 57:         tty->print_cr("asynclog dropping message: %s", h->message());
> 58:       }

I'm not sure this is needed or sensible. If we are flooded, we would then proceed to print every dropped message to tty? So now we flood tty? 

I'd hope exhausting the output buffer is rare, and easily remedied by telling the user to increase the buffer or stop using async mode.

src/hotspot/share/logging/logAsyncFlusher.cpp line 68:

> 66:   // notify asynclog thread if occupancy is over 3/4
> 67:   size_t sz = _buffer.size();
> 68:   if (sz > (_buffer_max_size >> 2) * 3 ) {

Can you reform this to use normal division? Ideally with "0.75" appearing somewhere as a constant which is easily changeable?

src/hotspot/share/logging/logAsyncFlusher.cpp line 102:

> 100:     _stats(17 /*table_size*/) {
> 101:   if (os::create_thread(this, os::asynclog_thread)) {
> 102:     os::start_thread(this);

Here is some error handling needed if thread creation failed. UL should drop to sync mode then.

src/hotspot/share/logging/logAsyncFlusher.cpp line 106:

> 104: 
> 105:   log_info(logging)("The maximum entries of AsyncLogBuffer: " SIZE_FORMAT ", estimated memory use: " SIZE_FORMAT " bytes",
> 106:                     _buffer_max_size, AsyncLogBufferSize);

nit: how about "Asynchronous logging enabled. Buffer size: xxx entries"?

src/hotspot/share/logging/logAsyncFlusher.cpp line 135:

> 133:   LinkedListImpl<AsyncLogMessage, ResourceObj::C_HEAP, mtLogging> logs;
> 134: 
> 135:   if (with_lock) { // critical area

I think the trick to get conditional Mutexes without having separate branches is to make the MutexLocker argument itself conditional (I think you can pass NULL, in which case no locking) - then you could merge these two branches.

src/hotspot/share/logging/logAsyncFlusher.cpp line 142:

> 140:     _buffer.pop_all(&logs);
> 141:     AsyncLogMapIterator iter;
> 142:     _stats.iterate(&iter);

All this is really difficult to understand for the casual reader. Clearer naming would help:
- something like "move-list-content" instead of pop_all
- the whole point of this iteration thing is to print the dropped counters, right? A function "Print Drop Counters" would be clearer. Unless you go the one-async-buffer-per-logfileoutput-route I mentioned earlier.

-------------

Changes requested by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3135