RFR: 8229517: Support for optional asynchronous/buffered logging

Sat Mar 27 07:30:31 UTC 2021

On Mon, 22 Mar 2021 22:12:14 GMT, Xin Liu <xliu at openjdk.org> wrote:

> This patch provides a buffer to store asynchrounous messages and flush them to
> underlying files periodically.

Hi Xin,

thank you for your detailed answers. 

As I wrote, I think this is a useful change. A prior design discussion with a rough sketch would have made things easier. Also, it would have been good to have the CSR discussion beforehand, since it affects how complex the implementation needs to be. I don't know whether there had been design discussions beforehand; if I missed them, I apologize.

I am keenly aware that design discussions often lead nowhere because no-one answers. So I understand why you started with a patch.

About your proposal: 

I do not think it can be made airtight, and I think that is okay - if we work with a limited flush buffer and we log too much, things will get dropped, that is unavoidable. But it has to be reliable and comprehensible after the fact.

As you write, the patch you propose works well with AWS, but I suspect that is an environment with limited variables, and outside use of the VM could be much more diverse. We must make sure to roll out only well designed solutions which work for us all. 

E.g. a log system which randomly omits log entries because some internal buffer is full without giving any indication *in the log itself* is a terrible idea :). Since log files are a cornerstone for our support, I am interested in a good solution.

First off, the CSR:

---
1) With what you propose, we could have a arbitrary combination of targets with different log files and different async options: 
java -Xlog:os*:file=os.log::async=false -Xlog:os+thread:file=thread.log::async=true

Do we really need that much freedom? How probable is that someone wants different async options for different trace sinks? The more freedom we have here the more complex the implementation gets. All that stuff has to be tested. Why not just make "async" a global setting.

2) AsyncLogBufferSize should be a user-definable memory size, not "number of slots". The fact that internally we keep a vector of disjunct memory snippets is an implementation detail; the user should just give a memory size and the VM should interprete this. This leaves us the freedom to later change the implementation as we see fit.

3) LogAsyncInterval should not exist at all. If it has to exist, it should be a diagnostic switch, not a production one; but ideally, we would just log as soon as there is something to log, see below.

---

Implementation:

The use of the WatcherThread and PeriodicTask. Polling is plain inefficient, beside the concerns Robbin voiced about blocking things. This is a typical producer-consumer problem, and I would implement it using an own dedicated flusher thread and a monitor. The flusher thread should wake only if there is something to write. This is something I would not do in a separate RFE but now. It would also disarm any arguments against blocking the WatcherThread.

----

The fact that every log message gets strduped could be done better. This can be left for a future RFE - but it explains why I dislike "AsyncLogBufferSize" being "number of entries" instead of a memory size.

I think processing a memory-size AsyncLogBufferSize can be kept simple: it would be okay to just guess an average log line length and go with that. Lets say 256 chars. An AsyncLogBufferSize=1M could thus be translated to 4096 entries in your solution. If the sum of all 4096 allocated lines overshoots 1M from time to time, well so be it.

A future better solution could use a preallocated fixed sized buffer. There are two ways to do this, the naive but memory inefficient way - array of fixed sized text slots like the event system does. And a smart way: a ring buffer of variable sized strings, '\0' separated, laid out one after the other in memory. The latter is a bit more involved, but can be done, and it would be fast and very memory efficient. But as I wrote, this is an optimization which can be postponed.

----

I may misunderstand the patch, but do you resolve decorators when the flusher is printing? Would this not distort time-dependent decorators (timemillis, timenanos, uptime etc)? Since we record the time of printing, not the time of logging?. 

If yes, it may be better to resolve the message early and just store the plain string and print that. Basically this would mean to move the whole buffering down a layer or two right at where the raw strings get printed. This would be vastly simplified if we abandon the "async for every trace sink" notion in favor of just a global flag.

This would also save a bit of space, since we would not have to carry all the meta information in `AsyncLogMessage` around. I count at least three 64bit slots, possibly 4-5, which alone makes for ~40 bytes per message. Resolved decorators are often smaller than that.

Please find further remarks inline.

> hi, Thomas,
> 
> Thank you for reviewing this PR.
> 
> > Hi Xin,
> > I skimmed over the patch, but have a number of high level questions - things which have not been clear from your description.
> > 
> > * Who does the writing, and who is affected when the writing stalls?
> 
> The WatchThread eventually flushes those buffered messages. if the writing stalls, it blocks periodic tasks.
> It blocks long enough, other periodic tasks are skipped.
> 
> > * Do you then block or throw output away?
> >   
> >   * If the former, how do you mitigate the ripple effect?
> >   * If the latter, how does the reader of the log file know that something is missing?
> 
> The capacity of buffer is limited, which is `AsyncLogBufferSize` (2k by default).
> Actually, logTagSet.cpp limits the maximal length of a vwrite is 512. That means that maximal memory used by this buffer is 1M (=2k * 0.5k).
> 
> If the buffer overflows, it starts dropping the heads. this behavior simulates a ringbuffer.
> If you enable `-XX:+Verbose`, the dropping message will be printed to the tty console.
> 
> I prefer to drop messages than keeping them growing because later may trigger out-of-memory error.
> 
> > * How often do you flush? How do you prevent missing output in the log file in case of crashes?
> 
> The interval is defined by `LogAsyncInterval` (300ms by default). I insert a statement `async->flusher()` in `ostream_abort()`.
> 

If the flusher blocks, this could block VM shutdown? Would this be different from what we do now, e.g. since all log output is serialized and done by one thread? Its probably fine, but we should think about this.

> > * Can this really the full brunt of logging (-Xlog:*=trace) over many threads?
> >   to be honest, it can't. I see a lot of dropping messages on console with -XX:+Verbose.
> 
> I have tuned parameters that it won't drop messages easily for normal GC activity with info verbosity.
> `-Xlog:*=trace` will drop messages indeed, but this is tunable. I have a [stress test](https://github.com/navyxliu/JavaGCworkload/blob/master/runJavaUL.sh) to tweak parameters.
> 
> > * Does this work with multiple target and multiple IO files?
> 
> Yes, it works if you have multiple outputs. `LogAsyncFlusher` is singleton. one single buffer and one thread serve them all.

The question was how we handle multiple trace sinks, see my "CSR" remarks.

> 
> > * Does it cost anything if logging is off or not async?
> 
> so far, LogAsyncFlusher as a periodic task remains active even no output is in async_mode.
> it wakes up every `LogAsyncInterval` ms. it's a dummy task because the deque is always empty. the cost is almost nothing.
> 
> > Update: Okay, I see you use PeriodicTask and the WatcherThread. Is this really enough? I would be concerned that it either runs too rarely to be able to swallow all output or that it runs that often that it monopolizes the WatcherThread.
> > I actually expected a separate Thread - or multiple, one per output - for this, waking up when there is something to write. That would also be more efficient than constant periodic polling.
> 
> You concern is reasonable. I don't understand why there is only one watchThread and up to 10 periodic tasks are crowded in it.
> If it's a bottleneck, I plan to improve this infrastructure. I can make hotspot supports multiple watcher threads and spread periodic tasks among them. All watcher threads are connected using linked list to manage.
> 
> Can we treat it as a separated task? for normal usage, I think the delay is quite managed. Writing thousands of lines to a file usually can be done in sub-ms.
> 
> > * How is the performance impact when we have lots of concurrent writes from many threads? I see that you use a Mutex to synchronize the logging threads with the flush service. Before, these threads would have done concurrent IO and that would be handled by the libc, potentially without locking.
> 
> IMHO, logging shouldn't hurt performance a lot. At least, those do impact on performance are not supposed to enable by default. On the other side, I hope logging messages from other threads avoid from interweaving when I enable them to read.
> That leads me to use mutex. That actually improves readability.
> 
> My design target is non-blocking. pop_all() is an ad-hoc operation which pop up all elements and release the mutex immediately. writeback() does IO without it.

Since you use a mutex it introduces synchronization, however short, across all logging threads. So it influences runtime behavior. For the record, I think this is okay; maybe a future RFE could improve this with a lockless algorithm. I just wanted to know if you measured anything, and I was curious whether there is a difference now between synchronous and asynchronous logging.

(Funnily, asynchronous logging is really more synchronous in a sense, since it synchronizes all logging threads across a common resource).

> 
> In our real applications, we haven't seen this feature downgrade GC performance yet.
> 
> > I think this feature could be useful. I am a bit concerned with the increased complexity this brings. UL is already a very (I think unnecessarily) complex codebase. Maybe we should try to reduce its complexity first before adding new features to it. This is just my opinion, lets see what others think.
> > Cheers, Thomas
> 
> I believe UL has its own reasons. In my defense, I don't make UL more complex. I only changed a couple of lines in one of its implementation file(logFileOutput.cpp) and didn't change its interfaces.
> 
> I try my best to reuse existing codebase. We can always refactor existing code([JDK-8239066](https://bugs.openjdk.java.net/browse/JDK-8239066), [JDK-8263840](https://bugs.openjdk.java.net/browse/JDK-8263840)), but it's not this PR's purpose.
> 

I understand. Its fine to do this in a later RFE.

> thanks,
> --lx

Cheers, Thomas

src/hotspot/share/logging/logAsyncFlusher.cpp line 33:

> 31:     // should cache this object somehow
> 32:     LogDecorations decorations(_level, _tagset, _decorators);
> 33:     _output.write_blocking(decorations, _message);

Would this mean that time dependent decorators get resolved at print time, not when the log happen?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135