RFR: 8323807: Async UL: Add a stalling mode to async UL

Tue Dec 17 01:42:38 UTC 2024

On Mon, 16 Dec 2024 17:11:13 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

> Hi,
> 
> In January of this year I took a stab at implementing a stalling mode for UL, see link: https://github.com/openjdk/jdk/pull/17757 . I also talked about this feature in the mailing lists and seemed to receive positive feedback. With that PR, I also implemented a circular buffer. This PR didn't go through because 1. The stalling mode was broken 2. The complexity was a bit too large imho.
> 
> This PR does a much smaller change by only focusing on implementing the actual stalling.
> 
> The addition in terms of command line changes are the same as before, you can now specify the mode of your async logging:
> 
> 
> $ java -Xlog:async:drop # Dropping mode, same as today
> $ java -Xlog:async:stall # Stalling mode!
> $ java -Xlog:async # Dropping mode by default still
> 
> 
> The change in protocol is quite simple. If a producer thread `P` cannot fit a message into the buffer, it `malloc`s a message and exposes it via a shared pointer. It blocks all other producer threads from writing into the buffer. At the same time, the consumer thread (`AsyncLogWriter`) will perform all writing. When the consumer thread has emptied the write buffer, it writes the stalled message, notifies `P` and releases all locks. `P` then let's all other producer threads continue.
> 
> We do this by having two locks: `Outer` and `Inner`. In our example above, `P` prevents any other producers from progressing by holding the outer lock, but allows the consumer thread to progress by releasing the inner lock.
> 
> In pseudo-code we have something like this in the stalling case.
> 
> 
> void produce() {
>   OuterLock olock;
>   InnerLock ilock;
>   bool out_of_memory = attempt_produce(shared_buffer);
>   if (out_of_memory) {
>     pmsg = new Message();
>     shared_message = pmsg;
>     while (shared_message != nullptr) ilock.wait();
>     free(pmsg);
>   }
> }
> 
> void consume() {
>   InnerLock ilock;
>   consume(shared_buffer);
>   if (shared_message != nullptr) {
>     consume(shared_message);
>     ilock.notify();
>   }
> }
> 
> 
> *Note!* It is very important that the consumer prints all output found in the buffer before printing the stalled message. This is because logging is output in Program Order. In other words: `print(m0); print(m1);` means that `m0` must appear before `m1` in the log file.
> 
> *Note!* Yes, we do force *all* threads to stall before the original stalled message has been printed. This isn't optimal, but I still have hope that we can switch to a faster circular buffer in the future (actually, I've got that code too). If logs are written to a SSD wi...

Overall this looks good. The two-lock approach seems to work well (modulo the potential for contention/delays but we will evaluate that through normal feedback processes). From your description the "outer" lock is a "producer" lock and the "inner" lock is a "consumer" lock, so I think it would be clearer to name them appropriately.

A few other comments/suggestions below.

We probably need some additional stress testing (do we already have some?) to put the different modes through their paces.

Thanks

src/hotspot/share/logging/logAsyncWriter.cpp line 98:

> 96: 
> 97:   if (LogConfiguration::async_mode() == LogConfiguration::AsyncMode::Stall) {
> 98:     void* ptr = os::malloc(Message::calc_size(msg_len), mtLogging);

Need to check for OOM

src/hotspot/share/logging/logAsyncWriter.cpp line 104:

> 102:       _inner_lock.wait(0 /* no timeout */);
> 103:     }
> 104:     os::free((Message*)ptr);

Cast is not needed

src/hotspot/share/logging/logAsyncWriter.cpp line 115:

> 113:   OuterLocker locker;
> 114:   InnerLocker ilocker;
> 115:   enqueue_locked(&output, decorations, msg);

The need for the consumer lock is part of the internal implementation so I would expect to only see it acquired inside `enqueue_locked`. 

Your `InnerLocker` wrapper should expose the monitor methods as well so that all interaction is via the wrapper.

src/hotspot/share/logging/logAsyncWriter.cpp line 209:

> 207:     write(snapshot);
> 208: 
> 209:     if (_stalled_message != nullptr) {

Just an observation here that reading this without the lock is only valid because only this thread ever nulls out the stalled message - hence a non-null stalled message must have come from another thread, and we then take the lock and so ensure we see the stalled message correctly.

src/hotspot/share/logging/logAsyncWriter.cpp line 254:

> 252:   if (_instance != nullptr) {
> 253:     {
> 254:       OuterLocker locker;

It is not obvious to me why we need the producer lock here. ??

src/hotspot/share/logging/logAsyncWriter.hpp line 163:

> 161:   Semaphore _flush_sem;
> 162:   // Can't use a Monitor here as we need a low-level API that can be used without Thread::current().
> 163:   // Producers take both locks in the order 0. _outer_lock 1. _inner_lock

Suggestion:

  // Producers take both locks in the order _outer_lock  then _inner_lock

src/hotspot/share/runtime/arguments.cpp line 2609:

> 2607:           // User provided unknown async option
> 2608:           ret = false;
> 2609:         }

I wonder if parsing the tail should be done by a `LogConfiguration` method instead of exposing the details to the argument parsing code directly?

-------------

Changes requested by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/22770#pullrequestreview-2507660074
PR Review Comment: https://git.openjdk.org/jdk/pull/22770#discussion_r1887780720
PR Review Comment: https://git.openjdk.org/jdk/pull/22770#discussion_r1887781316
PR Review Comment: https://git.openjdk.org/jdk/pull/22770#discussion_r1887769924
PR Review Comment: https://git.openjdk.org/jdk/pull/22770#discussion_r1887773013
PR Review Comment: https://git.openjdk.org/jdk/pull/22770#discussion_r1887774097
PR Review Comment: https://git.openjdk.org/jdk/pull/22770#discussion_r1887774812
PR Review Comment: https://git.openjdk.org/jdk/pull/22770#discussion_r1887776507