RFR: 8229517: Support for optional asynchronous/buffered logging

Xin Liu xliu at openjdk.java.net
Wed May 26 18:15:25 UTC 2021


On Wed, 26 May 2021 05:41:52 GMT, David Holmes <david.holmes at oracle.com> wrote:

> So again prior to this the real problem was a race between final logging
output and VM termination, that could result in either crashes or lost
logs messages, which in turn could cause test failures.

For `LogConfiguration::disable_outputs`, it's not just for JVM termination. 
When you send a dcmd `VM.log disable` to a running JVM, it also invokes it. 
 
That bug caused JVM may crashes AND lost messages. 
If `flush()` can't guarantee all dequeued messages have been written before returning, we may end up using broken pointers. 


void LogConfiguration::disable_outputs() {
  size_t idx = _n_outputs;

  // Remove all outputs from all tagsets.
  for (LogTagSet* ts = LogTagSet::first(); ts != NULL; ts = ts->next()) {
    ts->disable_outputs();
  }

  // Handle jcmd VM.log disable
  // ts->disable_outputs() above has deleted output_list with RCU synchronization.
  // Therefore, no new logging entry can enter AsyncLog buffer for the time being.
  // flush pending entries before LogOutput instances die.
  AsyncLogWriter::flush();

  while (idx > 0) {
    LogOutput* out = _outputs[--idx];
    // Delete the output unless stdout or stderr (idx 0 or 1)
    if (idx > 1) {
      delete_output(idx);
    } else {
      out->set_config_string("all=off");
    }
  }
}

Those LogOutput instances are destroyed right after flush(). 
Each AsyncLogMessage instance contains a reference of LogFileOutput. LogWriter thread may use those deleted pointers! 
that's the root cause you saw that intermittent crashes. 

```c++
e->output()->write_blocking(e->decorations(), msg);


> At one stage I believe you had intended to terminate the async writer
thread, and that would have fixed the problem as well - assuming you
handled the termination properly.

Yes, I used to have a termination protocol. It was also very complex because it must be MT-safe, no message lost and no concurrency with synchronous logging. Thomas disliked that and suggested me to remove the termination protocol(100+ line).

AsyncLogWriter::terminate() was placed before `LogConfiguration::finalize()`, therefore, it hid the problem in JVM termination. 
It can't help for the dcmd situation, which we still need strong `flush()`.The second use of strong `flush()` is for gtest. Without it, it's impossible to have a reliable way to check the results of async logging.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3135


More information about the hotspot-dev mailing list