RFR: 8229517: Support for optional asynchronous/buffered logging [v20]
David Holmes
david.holmes at oracle.com
Mon May 24 09:38:34 UTC 2021
On 24/05/2021 6:27 pm, Xin Liu wrote:
> On Mon, 24 May 2021 04:12:54 GMT, Xin Liu <xliu at openjdk.org> wrote:
>
>>> This patch provides a buffer to store asynchrounous messages and flush them to
>>> underlying files periodically.
>>
>> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
>>
>> flush() waits until all pending logging IO operations are done.
>>
>> This patch support all gtests in async mode.
>> make test TEST="gtest:all" TEST_OPTS='VM_OPTIONS=-Xlog:async'
>
> Your summary about _sem and _lock are correct. Now let's discuss on _io_sem. introduced in https://github.com/openjdk/jdk/pull/3135/commits/9800a8ccebb4490c9aeca781bc8155149245d308
>
>> IIUC what you are trying to deal with is a case where the log writing
> thread has emptied the buffer and as is busy writing that out, and other
> log messages then get enqueued, but then we abort/terminate and we call
> flush() in the aborting/terminating thread.
>
> Yes. Another scenario is to hard flush logs so gtest can see them.
>
>> So now we would have two threads trying to write to the log outputs and these different logging
> messages could get interleaved in a confusing way;
>
> interleave? No. _io_sem is a Dijkstra semaphore whose value is 1. you can imagine that _io_sem denotes the single IO channel in your system. Everybody can do operation P (aka. _io_sem.wait()) but only one can successfully return. Another has to wait because the value of _io_sem is zero. Yes, there's a race condition, but it doesn't matter. winner does the I/O jobs first. It will do operation V (_io_sem.notify()) in its completion and it will wake up the other thread. flush() first empties the buffer. When it returns, it can guarantee what it dequeued are written.
>
> Yes, it's possible that first batch of popped messages are written later. This only happens when a man explicitly "flush" it. Normally, only log writer thread does the IO.
That is what I meant by interleaved. The two blocks of logging
statements can be output in the wrong order.
>> First, as noted, making the aborting thread wait() could be very
> problematic for the process of aborting.
>
> In no content scenario, _io_sem.wait() of aborting thread should return immediately because its value is 1.
> If log write thread is writing, I think it's okay to wait for its 'turn'. That the semantics of "flushing": make sure all logs are written to the filesystem.
I don't agree. Making the aborting thread wait could be a problem.
Remember while an abort is in progress all other threads continue to run
and so by delaying the actual abort we can cause additional secondary
failures to occur. That doesn't seem like a good trade-off to me wrt.
issuing some additional logging. If the fault leads to the log writer
thread also faulting then the abort will hang.
>> Second, there is still a race as to which thread will get the io_sem
> first and so the output could still be presented in the wrong order.
>
> Yes, it may flip the order. I think it's still acceptable. When JVM is aborting, I manage to dump everything I am holding.
> The only compromise the batch order may be flip and it only happens in extreme condition.
Do the log messages already contain the appropriate timestamps from the
original log call, or is that added when the log message is actually
output to the destination? If the timestamps are already present and the
log will just present two blocks out-of-order then that is not too bad.
Hopefully the developer will spot them. But if the log appeared to show
things happening in the wrong chronological order then that would be a
big problem IMO.
Happy to hear other opinions.
Thanks,
David
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/3135
>
More information about the hotspot-dev
mailing list