advice & review requested for 6896297, race condition in rmid causing JCK failure
Stuart Marks
stuart.marks at oracle.com
Thu Apr 7 22:19:19 UTC 2011
Hi Core-Libs developers,
I'd like to solicit some advice and discussion about this bug and a potential
fix I'm cooking for it. Here is the bug report; it contains details about the
problem and my analysis of it:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6896297
and here's a webrev of the fix I'm working on:
http://cr.openjdk.java.net/~smarks/reviews/6896297/webrev.0/
Briefly, the problem is incorrect synchronization of groupTable, a HashMap
field of an Activation object. The code mostly locks groupTable around any
access to it. However, no such locking is done during serialization. If the
groupTable is modified while it's being serialized,
ConcurrentModificationException occurs.
The obvious fix would be to use ConcurrentHashMap instead of Hashmap and to
remove the external locking entirely. Unfortunately this will change the
serialized representation of the Activation object, which I'm not sure is
acceptable.
Assuming that we can't change the serialized represenation, the alternative
approach would be to make sure that locking is done properly during
serialization. This is fairly easy to do by locking groupTable in a
writeObject() method. Unfortunately, this introduces a deadlock.
This deadlock occurs because, with this modification, there are now paths
through the code that take locks in the opposite order. Specifically, the
addLogRecord() method locks the log object and then (via serialization and the
newly added writeObject() method) locks groupTable. However, the
unregisterGroup() method locks groupTable and calls
GroupEntry.unregisterGroup() which logs the event, which takes a lock on the log.
After some analysis, I've determined that the call to
GroupEntry.unregisterGroup() can be moved outside the synchronization of
groupTable. This removes the ordering problem.
With these fixes in place (the state of the webrev above) I can get several
hundred successful test runs with neither ConcurrentModificationException nor
any deadlocks.
Of course, that doesn't mean the change is correct. :-)
Questions:
1. Is there a requirement that the serialized form of Activation remain
unchanged? If we can change it, we might as well just use ConcurrentHashMap
instead of HashMap.
2. Is my lock ordering analysis correct? I've pored over the code, but not
really being familiar with any RMI concepts, I'm not sure I have it right. I've
written this analysis into a big comment I've added to the code.
3. There is also a potential concurrency issue with idTable, which is used
similarly to groupTable. I haven't seen a test failure from it though. It seems
sensible to add a lock for it in Activation.writeObject() though. I don't
particularly like nesting the locks of idTable and groupTable, but locking them
separately would require serializing the Activation object field by field
instead of simply using defaultWriteObject(). Is this a reasonable approach?
Thanks for any advice or comments.
s'marks
More information about the core-libs-dev
mailing list