RFR: 8229517: Support for optional asynchronous/buffered logging [v17]
Liu, Xin
xxinliu at amazon.com
Fri May 21 19:00:45 UTC 2021
hi, David,
I am aware of this issue. There are a few gtest testcases can't run in
async logging mode. The rootcause is testing code assumes logging is
synchronous.
eg. this testcase opens and reads 'TestLogFileName' right after
log_trace(). This is perfectly fine in sequential execution. It's unsafe
and TestLogFileName may haven't been written yet. fopen could fail(no
error check in current testing) and then fread yields NULL pointer for
strcmp.
TEST_VM_F(LogTest, prefix) {
set_log_config(TestLogFileName, "logging+test=trace");
log_trace(logging, test)(LOG_LINE_STR);
EXPECT_TRUE(file_contains_substring(TestLogFileName, LOG_PREFIX_STR
LOG_LINE_STR));
}
It's also not MT-safe to do concurrent fread() (gtest EXPECT/ASSERT)
while fwrite is onging(AsyncLog Thread). Random execution may happen if
FILE has been corrupted.
I manage to fix those hard crashes in gtests. It would up a non-trivial
patch, and it just fixed hard crashes. I reckon I give up running gtest
in async mode and provide an isolated and a clean room to test it.
https://github.com/openjdk/jdk/pull/3135/files#diff-81543b39e56eafc7fc7184d4334ad7b68ed1574b38eba158be2d065e4b35e42f
If you insist no hard crash would happen in gtest, how about I create a
separated issue and send that PR for review?
Technically, I can even make thoses failed testcases pass in async mode.
All I need to do is to insert AsyncLogWriter::flush() in paranoid mode.
That would ruin the readability of the existing tests.
FYI, not only gtest, there's one intermitten failure in jtreg-tier1 if
async logging is on. 'serviceability/dcmd/gc/RunGCTest.java'. The reason
is still lack of synchronization. This test uses dcmd 'GC.run' to
trigger a gc activity and expect to see logs. sometimes it fails to
capture the logs.
thanks,
--lx
On 5/21/21 7:32 AM, David Holmes wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> I ran this patch through our test system and the gtest is crashing in
> some cases:
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # Internal Error
> (/opt/mach5/mesos/work_dir/slaves/3c846bae-ce30-4a97-93ee-9fef4497ccb6-S124159/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/e6181b0d-b338-4338-8e7a-a969413802aa/runs/964cbe97-7c76-429e-a0cd-490459097714/workspace/open/src/hotspot/share/logging/logDecorations.cpp:78),
> pid=8163, tid=8179
> # assert(_decorators.is_decorator(decorator)) failed: decorator was not
> part of the decorator set specified at creation.
> #
> # JRE version: Java(TM) SE Runtime Environment (17.0) (fastdebug build
> 17-internal+0-LTS-2021-05-21-0645591.david.holmes.jdk-dev4.git)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug
> 17-internal+0-LTS-2021-05-21-0645591.david.holmes.jdk-dev4.git, mixed
> mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
> # Problematic frame:
> # V [libjvm.so+0x160ec44]
> LogDecorations::decoration(LogDecorators::Decorator, char*, unsigned
> long) const+0x94
> #
> # Core dump will be written. Default location: Core dumps may be
> processed with "/opt/core.sh %p" (or dumping to
> /opt/mach5/mesos/work_dir/slaves/3c846bae-ce30-4a97-93ee-9fef4497ccb6-S79062/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/e55170f3-f74b-45ca-955c-6c9f860f6f37/runs/b5e43087-b6d9-4d35-8001-49a0f73f47da/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_misc/scratch/0/core.8163)
> #
> # If you would like to submit a bug report, please visit:
> # https://bugreport.java.com/bugreport/crash.jsp
> #
>
> --------------- S U M M A R Y ------------
>
> Command Line: -XX:+ExecutingUnitTests -Xlog:async
>
> Host: AMD EPYC 7J13 64-Core Processor, 12 cores, 46G, Oracle Linux
> Server release 7.9
> Time: Fri May 21 07:01:05 2021 UTC elapsed time: 0.337971 seconds (0d 0h
> 0m 0s)
>
> --------------- T H R E A D ---------------
>
> Current thread (0x000056160528bb20): Thread [stack:
> 0x00007f64a9d22000,0x00007f64a9e22000] [id=8179]
>
> Stack: [0x00007f64a9d22000,0x00007f64a9e22000], sp=0x00007f64a9e205e0,
> free space=1017k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
> C=native code)
> V [libjvm.so+0x160ec44]
> LogDecorations::decoration(LogDecorators::Decorator, char*, unsigned
> long) const+0x94
> V [libjvm.so+0x1611a67] LogFileStreamOutput::write(LogDecorations
> const&, char const*)+0x77
> V [libjvm.so+0x1610d42] LogFileOutput::write_blocking(LogDecorations
> const&, char const*)+0x32
> V [libjvm.so+0x160b328] AsyncLogWriter::perform_IO() [clone
> .constprop.0]+0x348
> V [libjvm.so+0x160b4d0] AsyncLogWriter::run()+0x70
> V [libjvm.so+0x1c11800] Thread::call_run()+0x100
> V [libjvm.so+0x18e8c7e] thread_native_entry(Thread*)+0x10e
>
>
> David
> -----
>
> On 21/05/2021 4:24 pm, David Holmes wrote:
>> On Thu, 20 May 2021 09:08:04 GMT, Xin Liu <xliu at openjdk.org> wrote:
>>
>>>> This patch provides a buffer to store asynchrounous messages and flush them to
>>>> underlying files periodically.
>>>
>>> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
>>>
>>> Update according to reviewer's feedback.
>>
>> Overall I think this has shaped up rather well - thank you!
>>
>> Hard for me to fully grok everything by reading the code though so I've requested a summary of operation comment in the header file.
>>
>> Otherwise minor, mostly grammatical, comments.
>>
>> Thanks,
>> David
>>
>> src/hotspot/share/logging/logAsyncWriter.cpp line 75:
>>
>>> 73: }
>>> 74:
>>> 75: // LogMessageBuffer consists of a multiple-part/multiple-line messsages.
>>
>> typo: messages -> message
>>
>> src/hotspot/share/logging/logAsyncWriter.cpp line 76:
>>
>>> 74:
>>> 75: // LogMessageBuffer consists of a multiple-part/multiple-line messsages.
>>> 76: // the mutex here gurantees its integrity.
>>
>> typo: gurantees -> guarantees
>>
>> src/hotspot/share/logging/logAsyncWriter.cpp line 118:
>>
>>> 116: };
>>> 117:
>>> 118: void AsyncLogWriter::perform_IO() {
>>
>> Seems to be all O and no I :)
>>
>> src/hotspot/share/logging/logAsyncWriter.cpp line 164:
>>
>>> 162: if (!LogConfiguration::is_async_mode()) return;
>>> 163:
>>> 164: if (_instance == nullptr) {
>>
>> This method should only be called once so _instance must be NULL. An assert rather than an if would suffice.
>>
>> src/hotspot/share/logging/logAsyncWriter.cpp line 175:
>>
>>> 173: ts->wait_until_no_readers();
>>> 174: }
>>> 175: self->_state = ThreadState::Running;
>>
>> This seems like something the thread itself should do at the start of run()
>>
>> src/hotspot/share/logging/logAsyncWriter.cpp line 177:
>>
>>> 175: self->_state = ThreadState::Running;
>>> 176: os::start_thread(self);
>>> 177: log_debug(logging, thread)("AsyncLogging starts working.");
>>
>> Suggest: "Async logging thread started"
>>
>> src/hotspot/share/logging/logAsyncWriter.cpp line 179:
>>
>>> 177: log_debug(logging, thread)("AsyncLogging starts working.");
>>> 178: } else {
>>> 179: log_warning(logging, thread)("AsyncLogging failed to launch thread. fall back to synchronous logging.");
>>
>> "Async logging failed to create thread. Falling back to synchronous logging."
>>
>> Shouldn't this be done where the os::create_thread failed though?
>>
>> src/hotspot/share/logging/logAsyncWriter.hpp line 25:
>>
>>> 23: */
>>> 24: #ifndef SHARE_LOG_ASYNC_WTRITER_HPP
>>> 25: #define SHARE_LOG_ASYNC_WTRITER_HPP
>>
>> Typos: WTRITER
>>
>> src/hotspot/share/logging/logAsyncWriter.hpp line 34:
>>
>>> 32: #include "utilities/hashtable.hpp"
>>> 33: #include "utilities/linkedlist.hpp"
>>> 34:
>>
>> Could you write a summary of operation comment explaining how async mode works e.g. details on initialization and teardown, how the thread interacts with writers etc. Thanks.
>>
>> src/hotspot/share/logging/logAsyncWriter.hpp line 113:
>>
>>> 111: typedef KVHashtable<LogFileOutput*, uint32_t, mtLogging> AsyncLogMap;
>>> 112:
>>> 113: class AsyncLogWriter: public NonJavaThread {
>>
>> Any reason this is not a NamedThread? How will this appear in thread-dumps and crash reports?
>>
>> Also I would have expected two classes to simplify the APIs: AsyncLogWriter and AsyncLogThread
>>
>> src/hotspot/share/logging/logAsyncWriter.hpp line 115:
>>
>>> 113: class AsyncLogWriter: public NonJavaThread {
>>> 114: static AsyncLogWriter* _instance;
>>> 115: static Semaphore _sem;
>>
>> What is this semaphore for? A comment and/or a more meaningful name would be good.
>>
>> src/hotspot/share/logging/logAsyncWriter.hpp line 125:
>>
>>> 123:
>>> 124: volatile ThreadState _state;
>>> 125: AsyncLogMap _stats; // statistics of dropping messages.
>>
>> // statistics for dropped messages
>>
>> src/hotspot/share/logging/logAsyncWriter.hpp line 147:
>>
>>> 145:
>>> 146: // None of following functions are thread-safe.
>>> 147: static AsyncLogWriter* instance();
>>
>> instance() isn't thread-safe ??
>>
>> src/hotspot/share/logging/logConfiguration.cpp line 555:
>>
>>> 553: " This will cause existing log files to be overwritten.");
>>> 554: out->cr();
>>> 555: out->print_cr("\nAsynchronous logging(off by default):");
>>
>> space before "(off"
>>
>> src/hotspot/share/runtime/globals.hpp line 1993:
>>
>>> 1991: \
>>> 1992: product(size_t, AsyncLogBufferSize, 2*M, \
>>> 1993: "Memory budget(in bytes) for the buffer of Asynchronous " \
>>
>> space before (
>>
>> src/hotspot/share/runtime/globals.hpp line 1995:
>>
>>> 1993: "Memory budget(in bytes) for the buffer of Asynchronous " \
>>> 1994: "Logging (-Xlog:async).") \
>>> 1995: range(100*K, 50*M) \
>>
>> What is the basis for this range?
>>
>> src/hotspot/share/runtime/init.cpp line 127:
>>
>>> 125: return status;
>>> 126:
>>> 127: AsyncLogWriter::initialize();
>>
>> So prior to this point logging has been synchronous and now we switch if async-mode?
>>
>> -------------
>>
>> PR: https://git.openjdk.java.net/jdk/pull/3135
>>
More information about the hotspot-dev
mailing list