Collection.synchronizedXxx() : deadlocks, javadocs
Osvaldo Doederlein
opinali at gmail.com
Thu Dec 17 14:08:04 UTC 2009
Hi,
I just suffered the following deadlock, stress-testing my app in a JavaEE
application server:
Thread 1:
at java.util.Collections$SynchronizedSet.hashCode(Collections.java:1658)
- waiting to lock <0xfffffd7fdd8ce548> (a
java.util.Collections$SynchronizedSet)
Thread 2:
at
java.util.Collections$SynchronizedCollection.size(Collections.java:1557)
- waiting to lock <0xfffffd7fde2bb1a0> (a
java.util.Collections$SynchronizedSet)
at java.util.AbstractSet.equals(AbstractSet.java:75)
at java.util.Collections$SynchronizedSet.equals(Collections.java:1655)
- locked <0xfffffd7fdd8ce548> (a java.util.Collections$SynchronizedSet)
Thread 3:
at
java.util.Collections$SynchronizedCollection.size(Collections.java:1557)
- waiting to lock <0xfffffd7fdd8ce548> (a
java.util.Collections$SynchronizedSet)
at java.util.AbstractSet.equals(AbstractSet.java:75)
at java.util.Collections$SynchronizedSet.equals(Collections.java:1655)
- locked <0xfffffd7fde2bb1a0> (a java.util.Collections$SynchronizedSet)
This appserver has some internal Sets containing credentials information,
protected by Collections.synchronizedSet(). It seems the credentials sets
are often compared to each other, by just calling set1.equals(set2). This
causes a deadlock because equals() will invoke methods on set2, that are
also synchronized, but the appserver does this concurrently in many threads
for each application transaction. (My JVM dump reveals no less than 20 app
threads involved in this dealock: 18 of these waiting on 0xfffffd7fdd8ce548
to call hashCode(), 1 waiting on the same lock to call size(), and 1 waiting
on 0xfffffd7fde2bb1a0 also to call size().)
Now this is obviously a bug in the appserver, it should be using a lock on
the entire group of such Sets, at least for operations that involve more
than one Set like equals(). The javadocs already make clear that the
synchronized wrappers are not safe in some cases: "It is imperative that the
user manually synchronize on the returned sorted set when iterating over it
or any of its subSet, headSet, or tailSet views." But this documentation
doesn't cover other methods that may take another synchronized collection as
a parameters and may iterate it ("implementation detail" although easy to
guess), like containsAll(), retainAll(), etc. The size() method is another
special case, equals() could deadlock even when comparing two empty Sets,
because it needs minimally to invoke size() in the other collection and this
is sufficient to deadlock.
I wonder if we should have even more warnings in the javadocs, covering all
'dangerous' methods. This is not a stupid n00b bug, it's code from one of
the leading commercial application servers and I suppose its developers are
very skilled, still this bug is there (I'm going to report it today).
Generally, the synchronized wrappers are a pretty bad options, remarkably
since J2SE 5.0. I'd even go as far as suggesting that we just deprecate all
the Collections.synchronizedXxx() methods; besides deadlocks, it's way too
easy to have races by sequentially invoking multiple methods in the same
collections. The synchronizedXxx() APIs send the wrong message that locking
individual methods of the collection objects is adequate, which in my
experience is a bad (or at least fragile) assumption in 99% of the cases -
collections are just too fine-grained. There are other pitfalls, like
hashCode/equals() not delegating to the backing collection only for
synchronizedCollection().
In a separate issue, the collections returned by synchronizedXxx() are
synchronized using the original collection's monitor, so that this is safe:
Set sync1 = Collections.synchronizedSet(set1);
Set sync2 = Collections.synchronizedSet(set2);
synchronized (set1) {
synchronized (set2) {
if (sync1.equals(sync2) {...}
}
}
In the equals() call above, no deadlock is possible, even with other threads
that invoke methods in sync1 or sync2, as long as any threads that use both
will perform the same explicit locking in the same order set1->set2.
Encapsulating a mutex is a common practice, but in this case I wonder if
it's a good idea, at least as implemented (there is no public method to
provide the mutex, allowing the user to trade some scalability for code
that's simple and guaranteed deadlock-free). The user may decide to write
synchronized(sync1)... to prevent deadlocks like the one I reported, and
this is dangerous precisely because it would work most of the time - only
really convoluted code would still deadlock. The javadocs don't give any
clue of how the returned collections are synchronized, which is also bad
because I cannot rely that the code above will always work - I just know it
works for the particular Sun JDK release which sources I inspected. Some
code out there should certainly use this implementation knowledge to safely
synchronize groups of synchronizedXxx()'s collections, so this
implementation cannot be changed for good or bad. Once again, I suggest
additional documentation, to make the synchronization strategy explicit and
contractual.
P.S.: I also noticed some optimization opportunities along the collections
hierarchy. For example, AbstractSet.equals(other) could be redefined in
sorted sets, when other is also sorted, to not require lookups of each value
- just iterate both in the same direction comparing each pair of values, so
the cost is O(N) on size (the generic implementation is O(N log(N)) for
SortedSet.equals(SortedSet)). Even more optimization is often possible for
exact classes: for TreeSet.equals(TreeSet), we can visit both internal trees
recursively, without the allocation and indirection costs of a Iterator,
without initially finding the first entry, and without the complication of
successor(). It's a matter of finding the sweet spot between extra
performance and extra code size and maintenance - I suggest that at least,
those cases with demonstrably Big-O advantage are considered. The current
implementation still carries many size/speed tradeoff decisions made back in
J2SE 1.2, and I'm all against bloat but collections are something really
core, and improvements (especially of Big-O scale) can mean a lot. Notice
that the special case of mixed operations with several collections of the
same general kind, or even exact class - for equals(), containsAll() etc. -
is a very common case, one that's very worth of optimizing for.
A+
Osvaldo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20091217/0862f1d2/attachment.html>
More information about the core-libs-dev
mailing list