RFR: 8323807: Async UL: Add a stalling mode to async UL [v8]

Wed Jan 22 14:08:15 UTC 2025

On Wed, 22 Jan 2025 08:42:15 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> Hi,
>> 
>> In January of this year I took a stab at implementing a stalling mode for UL, see link: https://github.com/openjdk/jdk/pull/17757 . I also talked about this feature in the mailing lists and seemed to receive positive feedback. With that PR, I also implemented a circular buffer. This PR didn't go through because 1. The stalling mode was broken 2. The complexity was a bit too large imho.
>> 
>> This PR does a much smaller change by only focusing on implementing the actual stalling.
>> 
>> The addition in terms of command line changes are the same as before, you can now specify the mode of your async logging:
>> 
>> 
>> $ java -Xlog:async:drop # Dropping mode, same as today
>> $ java -Xlog:async:stall # Stalling mode!
>> $ java -Xlog:async # Dropping mode by default still
>> 
>> 
>> The change in protocol is quite simple. If a producer thread `P` cannot fit a message into the buffer, it `malloc`s a message and exposes it via a shared pointer. It blocks all other producer threads from writing into the buffer. At the same time, the consumer thread (`AsyncLogWriter`) will perform all writing. When the consumer thread has emptied the write buffer, it writes the stalled message, notifies `P` and releases all locks. `P` then let's all other producer threads continue.
>> 
>> We do this by having two locks: `Outer` and `Inner`. In our example above, `P` prevents any other producers from progressing by holding the outer lock, but allows the consumer thread to progress by releasing the inner lock.
>> 
>> In pseudo-code we have something like this in the stalling case.
>> 
>> 
>> void produce() {
>>   OuterLock olock;
>>   InnerLock ilock;
>>   bool out_of_memory = attempt_produce(shared_buffer);
>>   if (out_of_memory) {
>>     pmsg = new Message();
>>     shared_message = pmsg;
>>     while (shared_message != nullptr) ilock.wait();
>>     free(pmsg);
>>   }
>> }
>> 
>> void consume() {
>>   InnerLock ilock;
>>   consume(shared_buffer);
>>   if (shared_message != nullptr) {
>>     consume(shared_message);
>>     ilock.notify();
>>   }
>> }
>> 
>> 
>> *Note!* It is very important that the consumer prints all output found in the buffer before printing the stalled message. This is because logging is output in Program Order. In other words: `print(m0); print(m1);` means that `m0` must appear before `m1` in the log file.
>> 
>> *Note!* Yes, we do force *all* threads to stall before the original stalled message has been printed. This isn't optimal, but I still have hope that we can switch to a faster circu...
>
> Johan Sjölen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   David's suggestion plus hammering in that no messages are dropped

Axel's comments caused me to add a test where the buffer is too small to hold onto most log messages. This caused the system to deadlock!

The issue comes from the fact that the consumer thread also logs, so the consumer thread attempts to log, installs a stalled_message, and then waits for itself to log the message (which will obviously not happen). We fix it by explicitly refusing the asynchronous path if the current thread is the consumer thread when attempting to log.

There was further issues:

1. Axel correctly identified that a `notify` call is missing. This turned out not to matter, perhaps because of spurious wake ups? Regardless, I fixed it.
2. The flush token was acknowledged at the end of `write()` and not after the stalled message has been written. This means that a stalled message can be left and not printed before the VM exits. A minor issue in comparison, but still something worth fixing.

I decided to let the `!_data_available && _stalled_message  == nullptr` check be, it seemed neater to have two different signals there, as the `_stalled_message == nullptr` check is needed to communicate that the `_stalled_message` has been printed regardless.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22770#issuecomment-2607333604