RFR: 8323807: Async UL: Add a stalling mode to async UL

Johan Sjölen jsjolen at openjdk.org
Tue Dec 17 15:35:36 UTC 2024


On Tue, 17 Dec 2024 01:40:15 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Hi,
>> 
>> In January of this year I took a stab at implementing a stalling mode for UL, see link: https://github.com/openjdk/jdk/pull/17757 . I also talked about this feature in the mailing lists and seemed to receive positive feedback. With that PR, I also implemented a circular buffer. This PR didn't go through because 1. The stalling mode was broken 2. The complexity was a bit too large imho.
>> 
>> This PR does a much smaller change by only focusing on implementing the actual stalling.
>> 
>> The addition in terms of command line changes are the same as before, you can now specify the mode of your async logging:
>> 
>> 
>> $ java -Xlog:async:drop # Dropping mode, same as today
>> $ java -Xlog:async:stall # Stalling mode!
>> $ java -Xlog:async # Dropping mode by default still
>> 
>> 
>> The change in protocol is quite simple. If a producer thread `P` cannot fit a message into the buffer, it `malloc`s a message and exposes it via a shared pointer. It blocks all other producer threads from writing into the buffer. At the same time, the consumer thread (`AsyncLogWriter`) will perform all writing. When the consumer thread has emptied the write buffer, it writes the stalled message, notifies `P` and releases all locks. `P` then let's all other producer threads continue.
>> 
>> We do this by having two locks: `Outer` and `Inner`. In our example above, `P` prevents any other producers from progressing by holding the outer lock, but allows the consumer thread to progress by releasing the inner lock.
>> 
>> In pseudo-code we have something like this in the stalling case.
>> 
>> 
>> void produce() {
>>   OuterLock olock;
>>   InnerLock ilock;
>>   bool out_of_memory = attempt_produce(shared_buffer);
>>   if (out_of_memory) {
>>     pmsg = new Message();
>>     shared_message = pmsg;
>>     while (shared_message != nullptr) ilock.wait();
>>     free(pmsg);
>>   }
>> }
>> 
>> void consume() {
>>   InnerLock ilock;
>>   consume(shared_buffer);
>>   if (shared_message != nullptr) {
>>     consume(shared_message);
>>     ilock.notify();
>>   }
>> }
>> 
>> 
>> *Note!* It is very important that the consumer prints all output found in the buffer before printing the stalled message. This is because logging is output in Program Order. In other words: `print(m0); print(m1);` means that `m0` must appear before `m1` in the log file.
>> 
>> *Note!* Yes, we do force *all* threads to stall before the original stalled message has been printed. This isn't optimal, but I still have hope that we can switch to a faster circu...
>
> Overall this looks good. The two-lock approach seems to work well (modulo the potential for contention/delays but we will evaluate that through normal feedback processes). From your description the "outer" lock is a "producer" lock and the "inner" lock is a "consumer" lock, so I think it would be clearer to name them appropriately.
> 
> A few other comments/suggestions below.
> 
> We probably need some additional stress testing (do we already have some?) to put the different modes through their paces.
> 
> Thanks

@dholmes-ora, any comment that I've resolved I've dealt with but I haven't pushed up the changes yet. As for the out of memory case for malloc, I was considering whether we should `vm_exit_out_of_memory` if that occurs. Better to catch OOM and use the rest of the time we have to produce a proper error report, than try to carry on, in my opinion. What do you think?

@afshin-zafari,

Just to be extra clear: In my previous comment I was talking about `larger-than-size-of-whole-buffer` and not `larger-than-size-of-empty-part-of-buffer`.

>The empty part of the buffer can be small and therefore many producers will do the _stalled_message allocation, wait() and free() steps which is not efficient.

You're right. In the case of many stalled messages, we'd have a lot of malloc/free pairings, and that'd be inefficient. I do not anticipate that this will be the case, however, as the output of `-Xlog` for normal use cases isn't that large. 

I wanted to find out  what the typical output of `-Xlog` is for users, so I re-ran the experiment with a stock JDK-17. I took the `-Xlog` command to be compatible with the ULViewer software, as I presume that users take logs for analysis. The command I used was `-Xlog:gc=info,gc+metaspace=info,compilation+codecache=debug,gc+classhisto*=trace,vmoperation=debug,class+load=debug,class+unload=info,gc+age=trace:file=allofit::filecount=0`. Running the experiment we ended up with a log file of the size `4.7MiB`, that is 100x smaller than `-Xlog:all=trace` for stalling mode and 50x smaller than `-Xlog:all=trace` for dropping mode managed to save.


I think that we should focus on correctness first, simplicity second, and efficiency last in this case.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22770#issuecomment-2548770861


More information about the hotspot-runtime-dev mailing list