Fwd: [concurrency-interest] ConcurrentHashMap footprint and contention improvements
Members of this list are also invited to check this out. I'm not sure if it will make Java7 schedule, but the improvements seem worthwhile given the increased use of ConcurrentHashMap inside the JDK (including multiple tables per class loader in JDK7.) -Doug -------- Original Message -------- Subject: [concurrency-interest] ConcurrentHashMap footprint and contention improvements Date: Tue, 12 Apr 2011 20:07:17 -0400 From: Doug Lea <dl@cs.oswego.edu> To: Concurrency-interest@cs.oswego.edu <Concurrency-interest@cs.oswego.edu> For years, we've known that ConcurrentHashMaps have initial footprints (over 1000 bytes using default constructor) that are too big for casual use. And that the best way to address this would be to use the Fences API to emulate "final field" memory model guarantees in the presence of lazy initialization. But we aren't releasing the Fences API. So I committed a version that instead uses Unsafe calls to essentially the same effect (reducing initial footprint to around 100 bytes, plus a few percent savings for large populated tables). Also, this version includes throughput improvements under contention (mainly by interleaving locking with probes, to absorb cache misses), which can double performance on big tables with many threads. While conceptually straightforward, these lead to many line-of-code changes. The main price paid for these improvements is a greater reliance of "volatile" vs "final" reads, which are essentially equivalent in cost on most machines, but can be more costly on ARM and POWER. Even on these though, the net effect should be positive. It would be helpful if members of this list could help check that this is so. The committed version is now in java.util.concurrent sources (at http://gee.cs.oswego.edu/dl/concurrency-interest/index.html) and can be run by getting jsr166.jar and using "java -Xbootclasspath/p:jsr166.jar" with any java7 build or binary (http://dlc.sun.com.edgesuite.net/jdk7/binaries/index.html). Also, as an alternative, I temporarily placed an unpackaged source version (with the class renamed "CHM") at http://gee.cs.oswego.edu/dl/wwwtmp/CHM.java You can compile and somehow run in any java6/7 JVM. While working on these changes, I also contemplated other more extensive redesigns, including Cliff Click's non-blocking version (http://sourceforge.net/projects/high-scale-lib/) which usually has better scalability with large numbers of threads solely using get and put, but not otherwise uniformly a better choice. -Doug _______________________________________________ Concurrency-interest mailing list Concurrency-interest@cs.oswego.edu http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Doug, This change does appear worthwhile. We have a narrowing window of opportunity for JDK7. I spoke to Alan Bateman about this and we feel that if we could finalize the JDK7 changes by next Monday (that is, get them in for JDK7 b140), this would give us a reasonable time for them to soak before it gets harder to get changes into 7. Do you think this is a reasonable time frame? Do you expect any more implementation changes than what you have today? -Chris On 04/13/11 01:13 AM, Doug Lea wrote:
Members of this list are also invited to check this out. I'm not sure if it will make Java7 schedule, but the improvements seem worthwhile given the increased use of ConcurrentHashMap inside the JDK (including multiple tables per class loader in JDK7.)
-Doug
-------- Original Message -------- Subject: [concurrency-interest] ConcurrentHashMap footprint and contention improvements Date: Tue, 12 Apr 2011 20:07:17 -0400 From: Doug Lea <dl@cs.oswego.edu> To: Concurrency-interest@cs.oswego.edu <Concurrency-interest@cs.oswego.edu>
For years, we've known that ConcurrentHashMaps have initial footprints (over 1000 bytes using default constructor) that are too big for casual use. And that the best way to address this would be to use the Fences API to emulate "final field" memory model guarantees in the presence of lazy initialization. But we aren't releasing the Fences API. So I committed a version that instead uses Unsafe calls to essentially the same effect (reducing initial footprint to around 100 bytes, plus a few percent savings for large populated tables). Also, this version includes throughput improvements under contention (mainly by interleaving locking with probes, to absorb cache misses), which can double performance on big tables with many threads. While conceptually straightforward, these lead to many line-of-code changes.
The main price paid for these improvements is a greater reliance of "volatile" vs "final" reads, which are essentially equivalent in cost on most machines, but can be more costly on ARM and POWER. Even on these though, the net effect should be positive.
It would be helpful if members of this list could help check that this is so. The committed version is now in java.util.concurrent sources (at http://gee.cs.oswego.edu/dl/concurrency-interest/index.html) and can be run by getting jsr166.jar and using "java -Xbootclasspath/p:jsr166.jar" with any java7 build or binary (http://dlc.sun.com.edgesuite.net/jdk7/binaries/index.html). Also, as an alternative, I temporarily placed an unpackaged source version (with the class renamed "CHM") at http://gee.cs.oswego.edu/dl/wwwtmp/CHM.java You can compile and somehow run in any java6/7 JVM.
While working on these changes, I also contemplated other more extensive redesigns, including Cliff Click's non-blocking version (http://sourceforge.net/projects/high-scale-lib/) which usually has better scalability with large numbers of threads solely using get and put, but not otherwise uniformly a better choice.
-Doug
_______________________________________________ Concurrency-interest mailing list Concurrency-interest@cs.oswego.edu http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Chris Hegarty wrote:
Doug,
This change does appear worthwhile. We have a narrowing window of opportunity for JDK7. I spoke to Alan Bateman about this and we feel that if we could finalize the JDK7 changes by next Monday (that is, get them in for JDK7 b140), this would give us a reasonable time for them to soak before it gets harder to get changes into 7.
Do you think this is a reasonable time frame? Do you expect any more implementation changes than what you have today?
-Chris I don't think there is any harm trying but we should run as many tests and applications as possible. I saw Doug's mail to concurrency-interest and hopefully folks will report back their experiences with real applications in a timely manner.
-Alan.
On 04/13/11 11:58, Chris Hegarty wrote:
This change does appear worthwhile. We have a narrowing window of opportunity for JDK7. I spoke to Alan Bateman about this and we feel that if we could finalize the JDK7 changes by next Monday (that is, get them in for JDK7 b140), this would give us a reasonable time for them to soak before it gets harder to get changes into 7.
Do you think this is a reasonable time frame?
I got a few small suggestions for improvements etc (now incorporated) and no reports of performance anomalies, so I think we should commit. -Doug
On 04/18/11 11:18 AM, Doug Lea wrote:
On 04/13/11 11:58, Chris Hegarty wrote:
This change does appear worthwhile. We have a narrowing window of opportunity for JDK7. I spoke to Alan Bateman about this and we feel that if we could finalize the JDK7 changes by next Monday (that is, get them in for JDK7 b140), this would give us a reasonable time for them to soak before it gets harder to get changes into 7.
Do you think this is a reasonable time frame?
I got a few small suggestions for improvements etc (now incorporated) and no reports of performance anomalies, so I think we should commit.
Thanks Doug, I seen some of the comments on c-i. I just pulled the latest CHM from your CVS and have started a JPRT job. All going well, I hope to integrate these changes later today. -Chris.
-Doug
Here's another set of very minor javadoc updates: The lists of non-atomic "bulk" operations were incomplete in some concurrent collections and missing in others, as was noted in a recent ASPLOS paper "Specifying and Checking Semantic Atomicity for Multithreaded Programs" by Jacob Burnim, George Necula, Koushik Sen. (See http://www.cs.berkeley.edu/~jburnim/pubs/BurnimNeculaSen-ASPLOS11.pdf). A sample update is below. I updated jsr166 versions. I'll work with the usual folks to create a CR and integrate. (Thanks very much to Chris Hegarty in particular for being so helpful with these integrations!) diff -r1.73 ConcurrentLinkedQueue.java 46c46,53 < * of elements requires a traversal of the elements. ---
* of elements requires a traversal of the elements, and so may report * inaccurate results if this collection is modified during traversal. * Additionally, the bulk operations <tt>addAll</tt>, * <tt>removeAll</tt>, <tt>retainAll</tt>, <tt>containsAll</tt>, * <tt>equals</tt>, and <tt>toArray</tt> are <em>not</em> guaranteed * to be performed atomically. For example, an iterator operating * concurrently with an <tt>addAll</tt> operation might view only some * of the added elements.
Doug, I'll pull in these changes and try to get them into b142 ( before ramp down phase 2 ). Thanks, -Chris. On 04/22/11 12:53 PM, Doug Lea wrote:
Here's another set of very minor javadoc updates: The lists of non-atomic "bulk" operations were incomplete in some concurrent collections and missing in others, as was noted in a recent ASPLOS paper "Specifying and Checking Semantic Atomicity for Multithreaded Programs" by Jacob Burnim, George Necula, Koushik Sen. (See http://www.cs.berkeley.edu/~jburnim/pubs/BurnimNeculaSen-ASPLOS11.pdf). A sample update is below. I updated jsr166 versions. I'll work with the usual folks to create a CR and integrate. (Thanks very much to Chris Hegarty in particular for being so helpful with these integrations!)
diff -r1.73 ConcurrentLinkedQueue.java 46c46,53 < * of elements requires a traversal of the elements. ---
* of elements requires a traversal of the elements, and so may report * inaccurate results if this collection is modified during traversal. * Additionally, the bulk operations <tt>addAll</tt>, * <tt>removeAll</tt>, <tt>retainAll</tt>, <tt>containsAll</tt>, * <tt>equals</tt>, and <tt>toArray</tt> are <em>not</em> guaranteed * to be performed atomically. For example, an iterator operating * concurrently with an <tt>addAll</tt> operation might view only some * of the added elements.
participants (3)
-
Alan Bateman
-
Chris Hegarty
-
Doug Lea