RFR: 8323807: Async UL: Add a stalling mode to async UL [v2]
David Holmes
dholmes at openjdk.org
Thu Mar 28 07:37:41 UTC 2024
On Wed, 20 Mar 2024 17:05:49 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:
>> Hi,
>>
>> This PR does two things for the asynchronous UL mode:
>>
>> 1. Replaces the ping-pong buffers with a circular buffer.
>> 2. Adds a `stall` mode to async UL, ensuring that no messages are ever dropped if the log message producers out pace the log message writing thread.
>>
>> These changes have been discussed on the hotspot-dev mailing list under the thread name "Request for Comment: Add a stalling mode to asynchronous UL". Xin Liu, the original author of the ping-pong approach, has confirmed that switching to circular buffers leads to a reduced rate of dropped messages.
>>
>> The ticket description and my original e-mail contain approximately the same information and I'd recommend that you read either of those if you want an overview of why these changes are made and what the high-level protocol is.
>>
>> All the best,
>> Johan
>
> Johan Sjölen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 41 additional commits since the last revision:
>
> - Merge remote-tracking branch 'origin/master' into ul-async-producer-consumer-take2
> - Missing N
> - Move constructors out of header file
> - Style
> - Correctly implement stalling
> - Missed one
> - Fix lossy conversions in tests
> - Merge remote-tracking branch 'origin/master' into ul-async-producer-consumer-take2
> - Only include sys/mman on Linux
> - Revert back to original Linux impl
> - ... and 31 more: https://git.openjdk.org/jdk/compare/829f8a8e...8c4ced0f
Hi Johan,
I started to look at this and have a few high-level queries/concerns. Overall the idea is good.
I will get back to this after the Easter break.
Thanks
src/hotspot/share/logging/circularStringBuffer.cpp line 35:
> 33: const char* allocation_failure_msg = "Failed to allocate async logging buffer";
> 34:
> 35: #ifdef LINUX
Why do we want/need a specialized version for Linux?
src/hotspot/share/logging/circularStringBuffer.cpp line 92:
> 90: head(0) {}
> 91:
> 92: size_t CircularStringBuffer::used() {
Is this API being drawn from some pre-existing code for circular buffers? I find the used/unused terminology a little unusual.
src/hotspot/share/logging/circularStringBuffer.cpp line 94:
> 92: size_t CircularStringBuffer::used() {
> 93: size_t h = Atomic::load(&head);
> 94: size_t t = Atomic::load(&tail);
I assume these are `Atomic::load` because this code can be called without either the read lock or write lock? If so then the locked updates technically need to be `Atomic::store`.
src/hotspot/share/logging/circularStringBuffer.hpp line 119:
> 117: PlatformMonitor& _stats_lock;
> 118:
> 119: // Can't use a Monitor here as we need a low-level API that can be used without Thread::current().
I had forgotten that the async logging thread already used a `PlatformMutex` and so had to go back to the original PR to understand why - see some initial discussion here:
https://github.com/openjdk/jdk/pull/3135#discussion_r629739367
The problem you now face is that by using monitors with explicit `wait()`'s for the stalling is that these will delay safepoints because the PlatformMonitor is not safepoint aware. I think you will need to introduce `ThreadBlockInVM` around potentially blocking enqueues - though it may not be that simple as you still need to address the logging that happens after the current thread has been deleted.
src/hotspot/share/logging/circularStringBuffer.hpp line 124:
> 122: Semaphore _flush_sem;
> 123:
> 124: struct ReadLocker : public StackObj {
The ReadLocker/WriteLocker terminology initially made me think you were using ReadWriteLocks, but IIUC these are simply two distinct locks for different mutating operations. I prefer the producer/consumer naming from the description in JBS.
The overall locking protocol is unclear - the JBS description only uses mutexes not monitors.
src/hotspot/share/logging/logAsyncWriter.cpp line 66:
> 64: CircularStringBuffer::Message msg;
> 65: while (_circular_buffer.has_message()) {
> 66: CircularStringBuffer::DequeueResult result = _circular_buffer.dequeue(&msg, write_buffer, write_buffer_size);
Nit: is there any import/using/friend style directive we can use to avoid writing `CircularStringBuffer::DequeueResult` in full each time?
src/hotspot/share/runtime/arguments.cpp line 2591:
> 2589: } else if(strcmp(async_tail, ":drop") == 0) {
> 2590: LogConfiguration::set_async_mode(LogConfiguration::AsyncMode::Drop);
> 2591: } else if(*async_tail == '\0') {
Nits: need space after `if`
test/hotspot/gtest/logging/test_asynclog.cpp line 39:
> 37: // msg is 128 bytes.
> 38: const char* large_message = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
> 39: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
Nit: alignment
-------------
PR Review: https://git.openjdk.org/jdk/pull/17757#pullrequestreview-1965273990
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542384199
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542386925
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542389995
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542397909
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542406428
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542410126
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542413690
PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1542414616
More information about the hotspot-runtime-dev
mailing list