RFR: 8229517: Support for optional asynchronous/buffered logging
Xin Liu
xliu at openjdk.java.net
Wed May 26 18:15:25 UTC 2021
On Wed, 26 May 2021 05:41:52 GMT, David Holmes <david.holmes at oracle.com> wrote:
> So again prior to this the real problem was a race between final logging
output and VM termination, that could result in either crashes or lost
logs messages, which in turn could cause test failures.
For `LogConfiguration::disable_outputs`, it's not just for JVM termination.
When you send a dcmd `VM.log disable` to a running JVM, it also invokes it.
That bug caused JVM may crashes AND lost messages.
If `flush()` can't guarantee all dequeued messages have been written before returning, we may end up using broken pointers.
void LogConfiguration::disable_outputs() {
size_t idx = _n_outputs;
// Remove all outputs from all tagsets.
for (LogTagSet* ts = LogTagSet::first(); ts != NULL; ts = ts->next()) {
ts->disable_outputs();
}
// Handle jcmd VM.log disable
// ts->disable_outputs() above has deleted output_list with RCU synchronization.
// Therefore, no new logging entry can enter AsyncLog buffer for the time being.
// flush pending entries before LogOutput instances die.
AsyncLogWriter::flush();
while (idx > 0) {
LogOutput* out = _outputs[--idx];
// Delete the output unless stdout or stderr (idx 0 or 1)
if (idx > 1) {
delete_output(idx);
} else {
out->set_config_string("all=off");
}
}
}
Those LogOutput instances are destroyed right after flush().
Each AsyncLogMessage instance contains a reference of LogFileOutput. LogWriter thread may use those deleted pointers!
that's the root cause you saw that intermittent crashes.
```c++
e->output()->write_blocking(e->decorations(), msg);
> At one stage I believe you had intended to terminate the async writer
thread, and that would have fixed the problem as well - assuming you
handled the termination properly.
Yes, I used to have a termination protocol. It was also very complex because it must be MT-safe, no message lost and no concurrency with synchronous logging. Thomas disliked that and suggested me to remove the termination protocol(100+ line).
AsyncLogWriter::terminate() was placed before `LogConfiguration::finalize()`, therefore, it hid the problem in JVM termination.
It can't help for the dcmd situation, which we still need strong `flush()`.The second use of strong `flush()` is for gtest. Without it, it's impossible to have a reliable way to check the results of async logging.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3135
More information about the hotspot-dev
mailing list