8029281: Synchronization issues in Logger and LogManager

Daniel Fuchs daniel.fuchs at oracle.com
Tue Dec 3 18:57:06 UTC 2013


On 12/3/13 8:36 AM, Peter Levart wrote:
> Hi Daniel,
>
> 1st sorry for the delay. I promised looking at the patch, but was then
> distracted by other things. I think that synchronization in LogManager
> is correct now. The fact that Mandy thinks so is also reassuring.

Yes - that's usually a good sign :-)

> I wonder if calling LoggerWeakRef.dispose() from
> LoggerContext.addLocalLoger() is needed now that you check the
> LogNode.loggerRef before clearing in LoggerWeakRef.dispose():

Yes - we could probably call dispose() only when draining the
queue - but I still have the feeling that calling it early
is better. It makes sure that everything is properly cleaned-up
before we modify it.

I also like that we now have a single entry point - 'dispose()' - to
clean up after a gc'ed logger rather than doing half of the job now
and leaving the other half for later (and trusting that the later
code will not do anything stupid).

Thank you again for your reviews and feedback - they are very
helpful!

-- daniel

>
>                      if (n.loggerRef == this) {
>                          n.loggerRef = null;  // clear LogNode's weak
> ref to us
>                      }
>
>
> The LoggerContext.addLocalLoger() will already overwrite the
> namedLoggers Hashtable entry for the logger's name with new LoggerWeakRef:
>
>              namedLoggers.put(name, ref);
>
> ...and it will already overwrite the LogNode.loggerRef with new
> LoggerWeakRef:
>
>              // Find the new node and its parent.
>              LogNode node = getNode(name);
>              node.loggerRef = ref;
>
> So the only effect of calling LoggerWeakRef.dispose() from
> LoggerContext.addLocalLoger() is a more prompt unlinking of
> LoggerWeakRef child, which has already been cleared but not yet
> disposed(), from parent logger. This will eventually happen in a call to
> LogManager.drainLoggerRefQueueBounded(), called from public
> LogManager.addLogger(), which is the only entry point to
> LoggerContext.addLocalLoger(). But it's not wrong as it is now when you
> also ensure more prompt unlinking of cleared LoggerWeakRef and code is
> more uniform and less fragile then it would be if you created and called
> a hypothetical disposeFromParent() which only took care of unlinking
> from parent logger...
>
> Regards, Peter
>
>
> On 11/29/2013 12:41 PM, Daniel Fuchs wrote:
>> Hi,
>>
>> Here is a new revision that includes Peter's findings.
>> Thanks again for that Peter!
>>
>> <http://cr.openjdk.java.net/~dfuchs/webrev_8029281/webrev.01/>
>>
>> It has passed all jdk_core tests and running the new tests
>> in a loop hasn't triggered any of the issues that appeared
>> before the fix.
>>
>> -- daniel
>>
>> On 11/28/13 3:26 PM, Daniel Fuchs wrote:
>>> On 11/27/13 10:34 PM, Peter Levart wrote:
>>>> Hi Daniel,
>>>>
>>>> I have started looking at the LogManager change first, and here's the
>>>> 1st race I found...
>>>
>>> Hi Peter,
>>>
>>> Thanks a lot for your feedback! see below...
>>>
>>>>   The new method LoggerWeakRef.dispose() can be called from 3 places:
>>>>
>>>> - LoggerContext.addLocalLogger
>>>> - LoggerContext.findLogger
>>>> - LogManager.drainLoggerRefQueueBounded which is called from public
>>>> method LogManager.addLogger
>>>>
>>>> The 1st and 2nd are guarded by LoggerContext.this lock, but the 3rd is
>>>> unguarded.
>>>>
>>>> What can happen is the following:
>>>>
>>>> Thread1:
>>>>      LogManager.addLogger(someLogger)
>>>>          LogManager.drainLoggerRefQueueBounded() // finds some enqueued
>>>> LoggerWeakRef for name == 'X.Y' and...
>>>>              LoggerWeakRef.dispose() // this is 1st call to dispose,
>>>> so...
>>>>                  synchronized (LoggerWeakRef.this) {
>>>>                     disposed = true;
>>>>                  }
>>>>
>>>> ---context switch---
>>>>
>>>> Thread2:
>>>>          LogManager.addLogger(logger{name=='X.Y'})
>>>>              LogManager.drainLoggerRefQueueBounded()  // no enqueued
>>>> refs...
>>>>              LoggerContext.addLocalLogger(logger{name=='X.Y'}, true)
>>>>                  LoggerWeakRef ref = namedLoggers.get(name); // returns
>>>> a non-null ref ('X.Y' still points to a cleared weak ref)
>>>>                  if (ref.get() == null) {
>>>>                      ref.dispose(); // this is a 2nd call to this ref,
>>>> so it returns quickly
>>>>                  }
>>>>                  // ... namedLoggers gets new entry, etc...
>>>>                  LogNode node = getNode(name); // get node for 'X.Y'
>>>> *                node.loggerRef = ref;*
>>>>
>>>> ---context switch---
>>>>
>>>> Thread1: (still in LoggerWeakRef.dispose())
>>>>              if (node != null) {
>>>> *                node.loggerRef = null;*  // clear LogNode's weak
>>>> ref to
>>>> us *!!! NOT QUITE !!!*
>>>>
>>>>
>>>> ... so Thread1 erroneously clears the reference from Node to new
>>>> LoggerWeakRef that was just added by Thread2 ...
>>>
>>> Damn. I believe that you're right! I overlooked the fact that LogNode is
>>> reused.
>>>
>>> Locking on LogNode.context is indeed a good way of solving the issue.
>>> The only method that we will call from within the lock is
>>> LogNode.context.rmoveLoggerRef(name this) - and that already requires
>>> a lock on LogNode.context so it would be safe.
>>>
>>> I have also double checked the places where LogNode.loggerRef is set,
>>> and all those places (except in dispose() so far) are already from
>>> within a lock on LoggerContext.
>>>
>>> I think I would prefer however to keep the 'disposed' flag - and to
>>> check whether LogNode.loggerRef == this before clearing it.
>>>
>>> Given that - as I said - LogNode.loggerRef is always modified
>>> from within a lock on LoggerContext I think the following would
>>> work:
>>>
>>> void dispose() {
>>>      synchronized(this) {
>>>          if (disposed) return;
>>>          disposed = true;
>>>      }
>>>
>>>      final LogNode n = node;
>>>      if (n != null) {
>>>          synchronized (n.context) {
>>>              n.context.removeLoggerRef(name, this);
>>>              name = null;
>>>              if (n.loggerRef == this) {
>>>                  n.loggerRef = null;
>>>              }
>>>              node = null;
>>>           }
>>>      }
>>>
>>>      if (parentRef != null) {
>>>          Logger parent = parentRef.get();
>>>          if (parent != null) {
>>>              parent.removeChildLogger(this);
>>>          }
>>>          parentRef = null;  // clear our weak ref to the parent Logger
>>>      }
>>> }
>>>
>>> I don't think 'node' or 'parentRef' need to be volatile.
>>> That - said - we could still declare them volatile to make it obvious
>>> that the code in 'dispose()' will not see a stale value (I don't
>>> think it can, but having the variables volatile would make it
>>> obvious).
>>> If we reach 'dispose()' then it does mean that the LoggerWeakRef
>>> is no longer used - so I believe the real thing we need to guard against
>>> is actually modifications of the LogNode state - rather modification
>>> of the LoggerWeakRef  state.
>>> Therefore I believe that the important thing is to guarantee that
>>> all modifications to LogNode.loggerRef happen from within a lock
>>> on LoggerContext.
>>>
>>> I'd also like to keep the disposed flag because it makes it immediately
>>> obvious that the method is processed only once (so we haven't to worry
>>> about making it re-entrant at every step).
>>>
>>> Would you agree with the code above?
>>>
>>> best regards, and congratulations for catching this!
>>>
>>> -- daniel
>>>
>>>>
>>>>
>>>> One way to fix this is to synchronize the clearing of node.logerRef on
>>>> node.context too. I would make LoggerWeakRef.node and parentRef fields
>>>> volatile and do the following:
>>>>
>>>>
>>>>      final class LoggerWeakRef extends WeakReference<Logger> {
>>>>          private String                         name; // for
>>>> namedLoggers cleanup
>>>>          private volatile LogNode               node; // for loggerRef
>>>> cleanup
>>>>          private volatile WeakReference<Logger> parentRef;  // for kids
>>>> cleanup
>>>>
>>>>          LoggerWeakRef(Logger logger) {
>>>>              super(logger, loggerRefQueue);
>>>>              name = logger.getName();  // save for namedLoggers cleanup
>>>>          }
>>>>
>>>>          // dispose of this LoggerWeakRef object
>>>>          void dispose() {
>>>>              LogNode node = this.node;
>>>>              if (node != null) {
>>>>                  synchronized (node.context) {
>>>>                      node = this.node; // double-checked locking
>>>>                      if (node != null) {
>>>>                          // if we have a LogNode, then we were a named
>>>> Logger
>>>>                          // so clear namedLoggers weak ref to us
>>>>                          node.context.removeLoggerRef(name, this);
>>>>                          name = null;  // clear our ref to the
>>>> Logger's name
>>>>
>>>>                          node.loggerRef = null;  // clear LogNode's
>>>> weak
>>>> ref to us
>>>>                          this.node = null;       // clear our ref to
>>>> LogNode
>>>>                      }
>>>>                  }
>>>>              }
>>>>
>>>>              WeakReference<Logger> parentRef = this.parentRef;
>>>>              if (parentRef != null) {
>>>>                  // this LoggerWeakRef has or had a parent Logger
>>>>                  this.parentRef = null;  // clear our weak ref to the
>>>> parent Logger (racy, but harmless)
>>>>                  Logger parent = parentRef.get();
>>>>                  if (parent != null) {
>>>>                      // the parent Logger is still there so clear the
>>>>                      // parent Logger's weak ref to us
>>>>                      parent.removeChildLogger(this);
>>>>                  }
>>>>              }
>>>>          }
>>>>
>>>> ...
>>>>
>>>>
>>>> What do you think?
>>>>
>>>> That's all for today. Will check the rest of patch tomorrow.
>>>>
>>>>
>>>> Regards, Peter
>>>>
>>>>
>>>>
>>>> On 11/27/2013 08:23 PM, Daniel Fuchs wrote:
>>>>> Hi,
>>>>>
>>>>> Please find below a webrev for:
>>>>>
>>>>> 8029281: Synchronization issues in Logger and LogManager
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8029281:
>>>>>
>>>>> I believe this changeset will also fix JDK-8027670 and
>>>>> JDK-8029092 - which I have thus closed as duplicates of JDK-8029281.
>>>>>
>>>>> webrev: <http://cr.openjdk.java.net/~dfuchs/webrev_8029281/webrev.00/>
>>>>>
>>>>> Now some comments on the fix:
>>>>>
>>>>> LogManager:
>>>>>
>>>>> When a logger was garbage collected, and then a new logger
>>>>> of the same name requested, addlocalLogger used to call
>>>>> LoggerContext.removeLogger to remove the stale reference
>>>>> and replace it by a new one. But sometime, the stale reference
>>>>> was still in the reference queue, which caused the *new* logger
>>>>> to be wrongly removed later when the reference queue was drained.
>>>>>
>>>>> LogManager was also missing some synchronization for its 'props'
>>>>> variable.
>>>>>
>>>>> Logger:
>>>>>
>>>>> userBundle & resourceBundleName were sometimes accessed within
>>>>> a synchronized block, and sometimes without. In particular the
>>>>> getters were not synchronized, which could cause race conditions
>>>>> because an other thread could see inconsistent data.
>>>>>
>>>>> I also had to refactor the two methods getEffectiveResourceBundle()
>>>>> and getResourceBundleName() into a single method.
>>>>> The code there might be a bit more difficult to follow,
>>>>> because I made sure it preserved the former behavior wrt to
>>>>> what will be set in the LogRecord in doLog().
>>>>>
>>>>> Tests:
>>>>>
>>>>> The new TestLoggerBundleSync.java has been great to test
>>>>> the issues above in both Logger & LogManager (it detected
>>>>> the drainLoggerRefQueueBounded issue).
>>>>>
>>>>> Since I had to touch at drainLoggerRefQueueBounded to make
>>>>> TestLoggerBundleSync.java pass I also threw in a
>>>>> TestLogConfigurationDeadLock.java to make sure I hadn't introduced
>>>>> any new deadlock (and it detected the lack of synchronization
>>>>> in LogManager.props).
>>>>>
>>>>> best regards,
>>>>>
>>>>> -- daniel
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>




More information about the core-libs-dev mailing list