8029281: Synchronization issues in Logger and LogManager

Fri Nov 29 11:41:17 UTC 2013

Hi,

Here is a new revision that includes Peter's findings.
Thanks again for that Peter!

<http://cr.openjdk.java.net/~dfuchs/webrev_8029281/webrev.01/>

It has passed all jdk_core tests and running the new tests
in a loop hasn't triggered any of the issues that appeared
before the fix.

-- daniel

On 11/28/13 3:26 PM, Daniel Fuchs wrote:
> On 11/27/13 10:34 PM, Peter Levart wrote:
>> Hi Daniel,
>>
>> I have started looking at the LogManager change first, and here's the
>> 1st race I found...
>
> Hi Peter,
>
> Thanks a lot for your feedback! see below...
>
>>   The new method LoggerWeakRef.dispose() can be called from 3 places:
>>
>> - LoggerContext.addLocalLogger
>> - LoggerContext.findLogger
>> - LogManager.drainLoggerRefQueueBounded which is called from public
>> method LogManager.addLogger
>>
>> The 1st and 2nd are guarded by LoggerContext.this lock, but the 3rd is
>> unguarded.
>>
>> What can happen is the following:
>>
>> Thread1:
>>      LogManager.addLogger(someLogger)
>>          LogManager.drainLoggerRefQueueBounded() // finds some enqueued
>> LoggerWeakRef for name == 'X.Y' and...
>>              LoggerWeakRef.dispose() // this is 1st call to dispose,
>> so...
>>                  synchronized (LoggerWeakRef.this) {
>>                     disposed = true;
>>                  }
>>
>> ---context switch---
>>
>> Thread2:
>>          LogManager.addLogger(logger{name=='X.Y'})
>>              LogManager.drainLoggerRefQueueBounded()  // no enqueued
>> refs...
>>              LoggerContext.addLocalLogger(logger{name=='X.Y'}, true)
>>                  LoggerWeakRef ref = namedLoggers.get(name); // returns
>> a non-null ref ('X.Y' still points to a cleared weak ref)
>>                  if (ref.get() == null) {
>>                      ref.dispose(); // this is a 2nd call to this ref,
>> so it returns quickly
>>                  }
>>                  // ... namedLoggers gets new entry, etc...
>>                  LogNode node = getNode(name); // get node for 'X.Y'
>> *                node.loggerRef = ref;*
>>
>> ---context switch---
>>
>> Thread1: (still in LoggerWeakRef.dispose())
>>              if (node != null) {
>> *                node.loggerRef = null;*  // clear LogNode's weak ref to
>> us *!!! NOT QUITE !!!*
>>
>>
>> ... so Thread1 erroneously clears the reference from Node to new
>> LoggerWeakRef that was just added by Thread2 ...
>
> Damn. I believe that you're right! I overlooked the fact that LogNode is
> reused.
>
> Locking on LogNode.context is indeed a good way of solving the issue.
> The only method that we will call from within the lock is
> LogNode.context.rmoveLoggerRef(name this) - and that already requires
> a lock on LogNode.context so it would be safe.
>
> I have also double checked the places where LogNode.loggerRef is set,
> and all those places (except in dispose() so far) are already from
> within a lock on LoggerContext.
>
> I think I would prefer however to keep the 'disposed' flag - and to
> check whether LogNode.loggerRef == this before clearing it.
>
> Given that - as I said - LogNode.loggerRef is always modified
> from within a lock on LoggerContext I think the following would
> work:
>
> void dispose() {
>      synchronized(this) {
>          if (disposed) return;
>          disposed = true;
>      }
>
>      final LogNode n = node;
>      if (n != null) {
>          synchronized (n.context) {
>              n.context.removeLoggerRef(name, this);
>              name = null;
>              if (n.loggerRef == this) {
>                  n.loggerRef = null;
>              }
>              node = null;
>           }
>      }
>
>      if (parentRef != null) {
>          Logger parent = parentRef.get();
>          if (parent != null) {
>              parent.removeChildLogger(this);
>          }
>          parentRef = null;  // clear our weak ref to the parent Logger
>      }
> }
>
> I don't think 'node' or 'parentRef' need to be volatile.
> That - said - we could still declare them volatile to make it obvious
> that the code in 'dispose()' will not see a stale value (I don't
> think it can, but having the variables volatile would make it
> obvious).
> If we reach 'dispose()' then it does mean that the LoggerWeakRef
> is no longer used - so I believe the real thing we need to guard against
> is actually modifications of the LogNode state - rather modification
> of the LoggerWeakRef  state.
> Therefore I believe that the important thing is to guarantee that
> all modifications to LogNode.loggerRef happen from within a lock
> on LoggerContext.
>
> I'd also like to keep the disposed flag because it makes it immediately
> obvious that the method is processed only once (so we haven't to worry
> about making it re-entrant at every step).
>
> Would you agree with the code above?
>
> best regards, and congratulations for catching this!
>
> -- daniel
>
>>
>>
>> One way to fix this is to synchronize the clearing of node.logerRef on
>> node.context too. I would make LoggerWeakRef.node and parentRef fields
>> volatile and do the following:
>>
>>
>>      final class LoggerWeakRef extends WeakReference<Logger> {
>>          private String                         name; // for
>> namedLoggers cleanup
>>          private volatile LogNode               node; // for loggerRef
>> cleanup
>>          private volatile WeakReference<Logger> parentRef;  // for kids
>> cleanup
>>
>>          LoggerWeakRef(Logger logger) {
>>              super(logger, loggerRefQueue);
>>              name = logger.getName();  // save for namedLoggers cleanup
>>          }
>>
>>          // dispose of this LoggerWeakRef object
>>          void dispose() {
>>              LogNode node = this.node;
>>              if (node != null) {
>>                  synchronized (node.context) {
>>                      node = this.node; // double-checked locking
>>                      if (node != null) {
>>                          // if we have a LogNode, then we were a named
>> Logger
>>                          // so clear namedLoggers weak ref to us
>>                          node.context.removeLoggerRef(name, this);
>>                          name = null;  // clear our ref to the
>> Logger's name
>>
>>                          node.loggerRef = null;  // clear LogNode's weak
>> ref to us
>>                          this.node = null;       // clear our ref to
>> LogNode
>>                      }
>>                  }
>>              }
>>
>>              WeakReference<Logger> parentRef = this.parentRef;
>>              if (parentRef != null) {
>>                  // this LoggerWeakRef has or had a parent Logger
>>                  this.parentRef = null;  // clear our weak ref to the
>> parent Logger (racy, but harmless)
>>                  Logger parent = parentRef.get();
>>                  if (parent != null) {
>>                      // the parent Logger is still there so clear the
>>                      // parent Logger's weak ref to us
>>                      parent.removeChildLogger(this);
>>                  }
>>              }
>>          }
>>
>> ...
>>
>>
>> What do you think?
>>
>> That's all for today. Will check the rest of patch tomorrow.
>>
>>
>> Regards, Peter
>>
>>
>>
>> On 11/27/2013 08:23 PM, Daniel Fuchs wrote:
>>> Hi,
>>>
>>> Please find below a webrev for:
>>>
>>> 8029281: Synchronization issues in Logger and LogManager
>>> <https://bugs.openjdk.java.net/browse/JDK-8029281:
>>>
>>> I believe this changeset will also fix JDK-8027670 and
>>> JDK-8029092 - which I have thus closed as duplicates of JDK-8029281.
>>>
>>> webrev: <http://cr.openjdk.java.net/~dfuchs/webrev_8029281/webrev.00/>
>>>
>>> Now some comments on the fix:
>>>
>>> LogManager:
>>>
>>> When a logger was garbage collected, and then a new logger
>>> of the same name requested, addlocalLogger used to call
>>> LoggerContext.removeLogger to remove the stale reference
>>> and replace it by a new one. But sometime, the stale reference
>>> was still in the reference queue, which caused the *new* logger
>>> to be wrongly removed later when the reference queue was drained.
>>>
>>> LogManager was also missing some synchronization for its 'props'
>>> variable.
>>>
>>> Logger:
>>>
>>> userBundle & resourceBundleName were sometimes accessed within
>>> a synchronized block, and sometimes without. In particular the
>>> getters were not synchronized, which could cause race conditions
>>> because an other thread could see inconsistent data.
>>>
>>> I also had to refactor the two methods getEffectiveResourceBundle()
>>> and getResourceBundleName() into a single method.
>>> The code there might be a bit more difficult to follow,
>>> because I made sure it preserved the former behavior wrt to
>>> what will be set in the LogRecord in doLog().
>>>
>>> Tests:
>>>
>>> The new TestLoggerBundleSync.java has been great to test
>>> the issues above in both Logger & LogManager (it detected
>>> the drainLoggerRefQueueBounded issue).
>>>
>>> Since I had to touch at drainLoggerRefQueueBounded to make
>>> TestLoggerBundleSync.java pass I also threw in a
>>> TestLogConfigurationDeadLock.java to make sure I hadn't introduced
>>> any new deadlock (and it detected the lack of synchronization
>>> in LogManager.props).
>>>
>>> best regards,
>>>
>>> -- daniel
>>>
>>>
>>>
>>
>