RFR: 8323807: Async UL: Add a stalling mode to async UL [v10]

Fri Jan 24 05:46:50 UTC 2025

On Fri, 24 Jan 2025 05:31:40 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> The comment can probably be clarified.
>> 
>> The bug that we're fixing is specifically that the consumer thread logs, not the logging/consumption code which the consumer thread executes. Look at `Thread::call_run`:
>> 
>> ```c++
>>   log_debug(os, thread)("Thread " UINTX_FORMAT " stack dimensions: "
>>     PTR_FORMAT "-" PTR_FORMAT " (" SIZE_FORMAT "k).",
>>     os::current_thread_id(), p2i(stack_end()),
>>     p2i(stack_base()), stack_size()/1024);
>> 
>>   // Perform <ChildClass> initialization actions
>>   DEBUG_ONLY(_run_state = PRE_RUN;)
>>   this->pre_run();
>> 
>>   // Invoke <ChildClass>::run()
>>   DEBUG_ONLY(_run_state = RUN;)
>>   this->run();
>>   // Returned from <ChildClass>::run(). Thread finished.
>> 
>> 
>> Clearly we're logging before calling `this->run()` which will enter `AsyncLogWriter::run`.
>> 
>> And now look at the code for advertising that the asynchronous mode is running:
>> 
>> ```c++
>>   AsyncLogWriter* self = new AsyncLogWriter();
>>   if (self->_initialized) {
>>     Atomic::release_store_fence(&AsyncLogWriter::_instance, self);
>>     // All readers of _instance after the fence see non-null.
>>     // We use LogOutputList's RCU counters to ensure all synchronous logsites have completed.
>>     // After that, we start AsyncLog Thread and it exclusively takes over all logging I/O.
>>     for (LogTagSet* ts = LogTagSet::first(); ts != nullptr; ts = ts->next()) {
>>       ts->wait_until_no_readers();
>>     }
>>     os::start_thread(self);
>> 
>> 
>> So before the thread has been started all new logs are done using asynchronous logging. This means that the `log_debug` in the first snippet will cause the consumer thread to stall, as it's waiting for itself to take the stalled message and print it.
>> 
>> I was first considering fixing the initialization order, by doing something like below. That would also work, but I thought it's easier to reason about synchronous logging for the consumer thread, as it still gives us the same guarantees. After considering this some more, I think that both are necessary. If  any of the code in `os::pd_start_thread` ever logs, then we can run into a deadlock in stalling mode.
>> 
>> ```c++
>>      // We use LogOutputList's RCU counters to ensure all synchronous logsites have completed.
>>     // After that, we start AsyncLog Thread and it exclusively takes over all logging I/O.
>>     for (LogTagSet* ts = LogTagSet::first(); ts != nullptr; ts = ts->next()) {
>>       ts->wait_until_no_readers();
>>     }
>>   AsyncLogWriter* self = new Asy...
>
> <sigh> synchronous logging suffers from none of these pitfalls.
> 
> That said to some extent you are trying to fix self-inflicted wounds here. Trying to make UL work with an unrealistically smaller buffer seems counter productive. Can't we just ensure the buffer is large enough so that this initial logging in the AsyncLogWriter thread cannot stall? Or should we be using synchronous logging until after the AsyncLogWriter thread is fully initalized and has completed its own logging?

> I actually find this code very strange. Loading the _instance was never done using an acquire fence, so what guarantees that "All reads of _instance after the fence see non-null"?

That fact it is a fence is meant to  guarantee that. The fence ensures that any reads of `_instance` from memory subsequent to the fence, must see what was written prior to the fence. That said, this is wrong:

AsyncLogWriter* AsyncLogWriter::instance() {
  return _instance;
}

as it should be using `Atomic::load` to ensure the compiler doesn't stash in a register. Further `static AsyncLogWriter* _instance;` should be private so everyone has to use the `instance()` accessor.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22770#discussion_r1928117994