RFR: 8229517: Support for optional asynchronous/buffered logging [v2]

Sat Apr 3 01:30:24 UTC 2021

On Tue, 30 Mar 2021 13:17:05 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> Hi Xin, regrading the VM thread blocking on logs.
> 
> If you instead use two arrays, one active and one for flushing, you can swap them with atomic stores from the flushing thread.
> And use GlobalCounter::write_synchronize(); to make sure no writer is still using the swapped out array for logging.
> 
> The logging thread would use GlobalCounter::critical_section_begin(), atomic inc position to get the spot in the array for the log, store the log and then GlobalCounter::critical_section_end().
> 
> That way you will never block a logging thread with the flushing and run enqueues in parallel.
> 
> If you want really want smooth logging you could also remove the strdup, since it may cause a syscall to "break in" more memory.
> To solve that you could use the arrays as memory instead, and do bump the pointer allocation with an atomic increment to size needed instead of position.
> 
> I tested a bit locally generally I don't think there is an issue with blocking the VM thread on flushing.
> So I'm not really that concern about this, but it's always nice to have an algorithm which is constant time instead. (Neither CS begin()/end() or atomic inc can fail or loop on x86)

hi, Robbin, 

Thanks for your comments. I am new for GlobalCounter. please correct me if I am wrong. 

critical_section_begin/end() reminds me of Linux kernel's rcu_read_lock/unlck(). Traditionally, the best scenario of RCU is many reads and rare writes. it's because concurrent writers still need to use atomic operations or locking mechanism. Unfortunately, all participants of logging are writers. No one is read-only. 

The algorithm you described is appealing. The difficulty is that hotspot GlobalCounter is epoch-based. It's unsafe to swap two buffers until all writers are in quiescent state.  One `write_synchronize()` can't guarantee that. Thinking of that log sites are high-frequent and concurrent, you would see multiple version(epoch) in-flight.  Further, the most expensive synchronization is atomic_incrementation. You can't eliminate them. 

If we decide to go to a lock-free solution, I think the classic lock-free linkedlist is the best way to do that. I have seen that hotspot recently checks in a lock-free FIFO.  TBH, I would rather grumpy customers yell at me about async-logging performance first before jump into lock-free algorithms.  Here is why. 
1. it’s easy to implement using a lock. it is easy to be reviewed as well.
2. log sites should be less contentious. Some are temporally distanced. eg. classloading and gc marking.
3. aysnc logging is an auxiliary feature.  Even the entire logging subsystem is neither critical nor mandatory.

I know that I serialize all file-based logs using a mutex. In my defense, it's probably unusual to log several files in real life. As Volker pointed out earlier, the cost is effectively same as the existing futex imposed by Unified Logging there[1] if we have only one file output.

I like the idea of swapping 2 buffers. In our internal jdk8u, I do use this approach. flusher swaps 2 ring buffers after it dump one[2].  For linkedlist deque, [pop_all](https://github.com/openjdk/jdk/pull/3135/files#diff-a1d0f5e0f4539bc398c42f4298959ef2ecdbc62a6611dabcd903198e63f86431R51) is essentially swapping 2 linked lists in O(1) complexity.  It still uses a mutex, but it pops up all elements at once. The amortized cost is low.

I use the following approach to prevent blocked IO from suspending java threads and vmthread. 
1.  flushing is done in a NonJavaThread. It means it can ignore safepoints. 
2. release the lock of buffer before it start doing IO. It won't block anybody from enqueuing.

The main use case would be a synchronous log for everything, but asynchronous log for safepoint and gc. 
Recently, I am thinking how to get a consensus of a feature before beat it to death. Look, there's alway room to improve it. I think it's a good idea to focus on the core problem we are solving and sidetrack optimizations for future development.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/logging/logFileStreamOutput.cpp#L123
[2] https://cr.openjdk.java.net/~xliu/8229517/webrev04/webrev/src/share/vm/utilities/ostream.cpp.udiff.html

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135