RFR: 8323807: Async UL: Add a stalling mode to async UL [v2]

David Holmes dholmes at openjdk.org
Thu Mar 28 07:37:41 UTC 2024


On Wed, 20 Mar 2024 17:05:49 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> Hi,
>> 
>> This PR does two things for the asynchronous UL mode:
>> 
>> 1. Replaces the ping-pong buffers with a circular buffer.
>> 2. Adds a `stall` mode to async UL, ensuring that no messages are ever dropped if the log message producers out pace the log message writing thread.
>> 
>> These changes have been discussed on the hotspot-dev mailing list under the thread name "Request for Comment: Add a stalling mode to asynchronous UL". Xin Liu, the original author of the ping-pong approach, has confirmed that switching to circular buffers leads to a reduced rate of dropped messages.
>> 
>> The ticket description and my original e-mail contain approximately the same information and I'd recommend that you read either of those if you want an overview of why these changes are made and what the high-level protocol is.
>> 
>> All the best,
>> Johan
>
> Johan Sjölen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 41 additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'origin/master' into ul-async-producer-consumer-take2
>  - Missing N
>  - Move constructors out of header file
>  - Style
>  - Correctly implement stalling
>  - Missed one
>  - Fix lossy conversions in tests
>  - Merge remote-tracking branch 'origin/master' into ul-async-producer-consumer-take2
>  - Only include sys/mman on Linux
>  - Revert back to original Linux impl
>  - ... and 31 more: https://git.openjdk.org/jdk/compare/829f8a8e...8c4ced0f

Hi Johan,

I started to look at this and have a few high-level queries/concerns. Overall the idea is good.

I will get back to this after the Easter break.

Thanks

src/hotspot/share/logging/circularStringBuffer.cpp line 35:

> 33: const char* allocation_failure_msg = "Failed to allocate async logging buffer";
> 34: 
> 35: #ifdef LINUX

Why do we want/need a specialized version for Linux?

src/hotspot/share/logging/circularStringBuffer.cpp line 92:

> 90:     head(0) {}
> 91: 
> 92: size_t CircularStringBuffer::used() {

Is this API being drawn from some pre-existing code for circular buffers? I find the used/unused terminology a little unusual.

src/hotspot/share/logging/circularStringBuffer.cpp line 94:

> 92: size_t CircularStringBuffer::used() {
> 93:   size_t h = Atomic::load(&head);
> 94:   size_t t = Atomic::load(&tail);

I assume these are `Atomic::load` because this code can be called without either the read lock or write lock? If so then the locked updates technically need to be `Atomic::store`.

src/hotspot/share/logging/circularStringBuffer.hpp line 119:

> 117:   PlatformMonitor& _stats_lock;
> 118: 
> 119:   // Can't use a Monitor here as we need a low-level API that can be used without Thread::current().

I had forgotten that the async logging thread already used a `PlatformMutex` and so had to go back to the original PR to understand why - see some initial discussion here:

https://github.com/openjdk/jdk/pull/3135#discussion_r629739367

The problem you now face is that by using monitors with explicit `wait()`'s for the stalling is that these will delay safepoints because the PlatformMonitor is not safepoint aware. I think you will need to introduce `ThreadBlockInVM` around potentially blocking enqueues - though it may not be that simple as you still need to address the logging that happens after the current thread has been deleted.

src/hotspot/share/logging/circularStringBuffer.hpp line 124:

> 122:   Semaphore _flush_sem;
> 123: 
> 124:   struct ReadLocker : public StackObj {

The ReadLocker/WriteLocker terminology initially made me think you were using ReadWriteLocks, but IIUC these are simply two distinct locks for different mutating operations. I prefer the producer/consumer naming from the description in JBS.

The overall locking protocol is unclear - the JBS description only uses mutexes not monitors.

src/hotspot/share/logging/logAsyncWriter.cpp line 66:

> 64:   CircularStringBuffer::Message msg;
> 65:   while (_circular_buffer.has_message()) {
> 66:     CircularStringBuffer::DequeueResult result = _circular_buffer.dequeue(&msg, write_buffer, write_buffer_size);

Nit: is there any import/using/friend style directive we can use to avoid writing `CircularStringBuffer::DequeueResult` in full each time?

src/hotspot/share/runtime/arguments.cpp line 2591:

> 2589:         } else if(strcmp(async_tail, ":drop") == 0) {
> 2590:           LogConfiguration::set_async_mode(LogConfiguration::AsyncMode::Drop);
> 2591:         } else if(*async_tail == '\0') {

Nits: need space after `if`

test/hotspot/gtest/logging/test_asynclog.cpp line 39:

> 37:   // msg is 128 bytes.
> 38:   const char* large_message = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
> 39:                     "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";

Nit: alignment

-------------

PR Review: https://git.openjdk.org/jdk/pull/17757#pullrequestreview-1965273990
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542384199
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542386925
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542389995
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542397909
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542406428
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542410126
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542413690
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542414616


More information about the hotspot-runtime-dev mailing list