Possible HashMap update

Paul Sandoz paul.sandoz at oracle.com
Fri Jul 5 08:55:52 UTC 2013


I played with these in the lambda repo. 

I needed to make the following additional change for things to compile:

--- a/src/share/classes/java/io/ExpiringCache.java	Fri Jul 05 10:04:00 2013 +0200
+++ b/src/share/classes/java/io/ExpiringCache.java	Fri Jul 05 10:45:10 2013 +0200
@@ -64,8 +64,8 @@
     @SuppressWarnings("serial")
     ExpiringCache(long millisUntilExpiration) {
         this.millisUntilExpiration = millisUntilExpiration;
-        map = new LinkedHashMap<String,Entry>() {
-            protected boolean removeEldestEntry(Map.Entry<String,Entry> eldest) {
+        map = new LinkedHashMap<String,ExpiringCache.Entry>() {
+            protected boolean removeEldestEntry(Map.Entry<String,ExpiringCache.Entry> eldest) {
               return size() > MAX_ENTRIES;
             }
           };

I don't quite know why, need to look in more detail.


Passes the stream tests and selected j.u tests.

The test java/util/Map/CheckRandomHashSeed needs updating, but i presume this test should be removed if similar updates are planed for WeakHashMap and Hashtable:

diff -r 291a872e1763 test/java/util/Map/CheckRandomHashSeed.java
--- a/test/java/util/Map/CheckRandomHashSeed.java	Fri Jul 05 10:04:00 2013 +0200
+++ b/test/java/util/Map/CheckRandomHashSeed.java	Fri Jul 05 10:53:50 2013 +0200
@@ -53,8 +53,6 @@
             throw new Error("Error in test setup: " + (expectRandom ? "" : "not " ) + "expecting random hashSeed, but " + PROP_NAME + " is " + (propSet ? "" : "not ") + "enabled");
         }
 
-        testMap(new HashMap());
-        testMap(new LinkedHashMap());
         testMap(new WeakHashMap());
         testMap(new Hashtable());
     }


Paul.

On Jul 4, 2013, at 12:55 AM, Doug Lea <dl at cs.oswego.edu> wrote:

> 
> A few weeks ago, in the course of doing a (hopefully-last) reworking of
> JDK8 ConcurrentHashMap (CHM) internals based on experience with
> preview versions, I noticed that the balanced tree scheme now in place
> as a fallback in the case of usages with many keys all with the same
> hashCode could also be profitably used in other cases: CHM (like
> HashMap) previously performed some bit-scrambling of user hashCodes to
> make it less likely that user hashCodes that were non-identical but
> highly-non-random will collide in the same bins (thus with O(n) vs the
> expected O(1) performance).  This bit-scrambling is an
> anti-optimization in many usages that do not contain the kinds of
> systematic hashCode bias that most hurt performance.  While it is
> reasonably fast, it still places computation where you do not
> want it: in front of the likely cache-miss to access an entry. Plus,
> bit-scrambling is only statistically likely to help.  It is still
> possible to encounter hashCodes that systematically collide. It would
> be nice to provide O(log N) guarantees for these cases as well.
> 
> So, CHM now omits bit-scrambling (well, almost -- just a single xor to
> avoid losing impact of top bits), and instead rolls over into using
> trees for bins that contain a statistically unlikely number of keys,
> and rolling back into simple bin form if/when they don't due to
> deletions or resizings. ("Statistically unlikely" is a surprisingly
> low number. More than 8 nodes are expected only about once per ten
> million under perfectly random keys; this is also often (depending on
> cost of equals() methods, additional dispatch overhead, etc), around
> the break-even point for using trees vs lists).  The updated tree
> mechanics now provide O(log N) worst-case performance in either of two
> cases: (colliding non-identical-hashCodes), and (identical hashCodes
> but mutually Comparable keys). And does so while also speeding up by a
> little those typical usages that do not encounter systematic
> collisions.  Also, there is no sense at all in using any form of
> hashSeed in this scheme. It basically behaves the same whether or not
> you use one.
> 
> The overall impact appears to be a net win.  Non-heavily-colliding
> cases are a bit faster. Some colliding cases are a little slower and
> use more memory (tree-based nodes are bigger than plain ones), but
> have much better worst (and typical) case bounds unless people use
> crazily-bad keys that all have same hashCodes but are never equal or
> comparable to others, in which case treeification is wasted effort
> (but only by a constant factor of about 2 in both time and space).
> 
> This seemed to be enough of an improvement in terms of both worst-case
> and expected performance to try applying it to TreeBin-enabled
> HashMap. So I gave it a try. To enable bidirectional tree<->plain node
> conversion, I used the subclassing strategy mentioned during
> discussion of the initial balanced tree version. This in turn requires
> a completely different within-package internal API to allow
> LinkedHashMap to continue to subclass HashMap.  A couple of internal
> constructions exist solely to allow the same source code to be
> identical in HashMap and ConcurrenHashMap (but not the same compiled
> code, since the types they operate on are and must be unrelated in the
> two classes, and because they live in different packages, there's no
> nice way to express their commonalities without exposing.)
> 
> This note serves as a pre-proposal request for comment, as well a
> request for anyone interested in trying it out, especially in any
> weird/unusual use cases, to report experiences before seriously
> considering it as a replacement. To simplify logistics, I checked in the
> code in a workspace directory in jsr166:
> 
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/dl/java/util/HashMap.java?view=log
> 
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/dl/java/util/LinkedHashMap.java?view=log




More information about the core-libs-dev mailing list