RFR: 8229517: Support for optional asynchronous/buffered logging

Mon Mar 29 01:49:50 UTC 2021

Hi Xin,

On 27/03/2021 5:30 pm, Thomas Stuefe wrote:
> On Mon, 22 Mar 2021 22:12:14 GMT, Xin Liu <xliu at openjdk.org> wrote:
> 
>> This patch provides a buffer to store asynchrounous messages and flush them to
>> underlying files periodically.

IMO the discussions last year left it still an open question whether 
this was a general purpose facility that we really needed/wanted in 
hotspot. I think there is still a larger design discussion to be had for 
a general purpose facility versus the in-house version you have been 
using that suited your needs.

I'm piggy-backing on some of Thomas's comments below.

> Hi Xin,
> 
> thank you for your detailed answers.
> 
> As I wrote, I think this is a useful change. A prior design discussion with a rough sketch would have made things easier. Also, it would have been good to have the CSR discussion beforehand, since it affects how complex the implementation needs to be. I don't know whether there had been design discussions beforehand; if I missed them, I apologize.
> 
> I am keenly aware that design discussions often lead nowhere because no-one answers. So I understand why you started with a patch.
> 
> About your proposal:
> 
> I do not think it can be made airtight, and I think that is okay - if we work with a limited flush buffer and we log too much, things will get dropped, that is unavoidable. But it has to be reliable and comprehensible after the fact.
> 
> As you write, the patch you propose works well with AWS, but I suspect that is an environment with limited variables, and outside use of the VM could be much more diverse. We must make sure to roll out only well designed solutions which work for us all.
> 
> E.g. a log system which randomly omits log entries because some internal buffer is full without giving any indication *in the log itself* is a terrible idea :). Since log files are a cornerstone for our support, I am interested in a good solution.
> 
> First off, the CSR:
> 
> ---
> 1) With what you propose, we could have a arbitrary combination of targets with different log files and different async options:
> java -Xlog:os*:file=os.log::async=false -Xlog:os+thread:file=thread.log::async=true
>   
> 
> Do we really need that much freedom? How probable is that someone wants different async options for different trace sinks? The more freedom we have here the more complex the implementation gets. All that stuff has to be tested. Why not just make "async" a global setting.

Truly global or global for all actual file-based logging? I think 
perhaps the latter. If we need per log file control that could be added 
later if needed.

> 2) AsyncLogBufferSize should be a user-definable memory size, not "number of slots". The fact that internally we keep a vector of disjunct memory snippets is an implementation detail; the user should just give a memory size and the VM should interprete this. This leaves us the freedom to later change the implementation as we see fit.

I'm not sure it should be a bounded size at all. I don't like the idea 
of needing yet another VM flag to control this. Why can't the design 
accommodate unlimited buffer space as determined by available memory?

> 3) LogAsyncInterval should not exist at all. If it has to exist, it should be a diagnostic switch, not a production one; but ideally, we would just log as soon as there is something to log, see below.

The logging interval should be configurable IMO, so it either needs a 
product switch, or preferably is set as a global logging option via the 
Xlog command if that is possible.

> 
> ---
> 
> Implementation:
> 
> The use of the WatcherThread and PeriodicTask. Polling is plain inefficient, beside the concerns Robbin voiced about blocking things. This is a typical producer-consumer problem, and I would implement it using an own dedicated flusher thread and a monitor. The flusher thread should wake only if there is something to write. This is something I would not do in a separate RFE but now. It would also disarm any arguments against blocking the WatcherThread.

I agree with Thomas here. Using the WatcherThread for this is not really 
appropriate, though I understand it is convenient to use a thread that 
is already present and, crucially, one that does not block for 
safepoints. And we don't need multiple WatcherThreads or a way to 
"spread the load" as the Watcher thread is very lightly loaded. So a new 
kind of NonJavaThread should be introduced for this purpose - if it is 
truly required - and directly synchronize with it rather than using 
polling. Further, we must ensure that the flushing thread cannot block 
the VMThread, if the VMThread is doing logging.

> ----
> 
> The fact that every log message gets strduped could be done better. This can be left for a future RFE - but it explains why I dislike "AsyncLogBufferSize" being "number of entries" instead of a memory size.

If we had had async logging from day one then the way we construct log 
messages would have been done differently. With all the logging sites 
assuming synchronous logging, with stack-local allocations/resources, we 
have no choice but to copy messages to memory with a suitable lifetime.

As I started with, I think there needs to be a return to the high level 
architectural design discussion for this feature. The PR serves as a 
proof-of-concept, but should not IMO be presented as the right general 
solution for all hotspot users.

Regards,
David
-----
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/3135
>