Possible HashMap update
Paul Sandoz
paul.sandoz at oracle.com
Fri Jul 5 08:55:52 UTC 2013
I played with these in the lambda repo.
I needed to make the following additional change for things to compile:
--- a/src/share/classes/java/io/ExpiringCache.java Fri Jul 05 10:04:00 2013 +0200
+++ b/src/share/classes/java/io/ExpiringCache.java Fri Jul 05 10:45:10 2013 +0200
@@ -64,8 +64,8 @@
@SuppressWarnings("serial")
ExpiringCache(long millisUntilExpiration) {
this.millisUntilExpiration = millisUntilExpiration;
- map = new LinkedHashMap<String,Entry>() {
- protected boolean removeEldestEntry(Map.Entry<String,Entry> eldest) {
+ map = new LinkedHashMap<String,ExpiringCache.Entry>() {
+ protected boolean removeEldestEntry(Map.Entry<String,ExpiringCache.Entry> eldest) {
return size() > MAX_ENTRIES;
}
};
I don't quite know why, need to look in more detail.
Passes the stream tests and selected j.u tests.
The test java/util/Map/CheckRandomHashSeed needs updating, but i presume this test should be removed if similar updates are planed for WeakHashMap and Hashtable:
diff -r 291a872e1763 test/java/util/Map/CheckRandomHashSeed.java
--- a/test/java/util/Map/CheckRandomHashSeed.java Fri Jul 05 10:04:00 2013 +0200
+++ b/test/java/util/Map/CheckRandomHashSeed.java Fri Jul 05 10:53:50 2013 +0200
@@ -53,8 +53,6 @@
throw new Error("Error in test setup: " + (expectRandom ? "" : "not " ) + "expecting random hashSeed, but " + PROP_NAME + " is " + (propSet ? "" : "not ") + "enabled");
}
- testMap(new HashMap());
- testMap(new LinkedHashMap());
testMap(new WeakHashMap());
testMap(new Hashtable());
}
Paul.
On Jul 4, 2013, at 12:55 AM, Doug Lea <dl at cs.oswego.edu> wrote:
>
> A few weeks ago, in the course of doing a (hopefully-last) reworking of
> JDK8 ConcurrentHashMap (CHM) internals based on experience with
> preview versions, I noticed that the balanced tree scheme now in place
> as a fallback in the case of usages with many keys all with the same
> hashCode could also be profitably used in other cases: CHM (like
> HashMap) previously performed some bit-scrambling of user hashCodes to
> make it less likely that user hashCodes that were non-identical but
> highly-non-random will collide in the same bins (thus with O(n) vs the
> expected O(1) performance). This bit-scrambling is an
> anti-optimization in many usages that do not contain the kinds of
> systematic hashCode bias that most hurt performance. While it is
> reasonably fast, it still places computation where you do not
> want it: in front of the likely cache-miss to access an entry. Plus,
> bit-scrambling is only statistically likely to help. It is still
> possible to encounter hashCodes that systematically collide. It would
> be nice to provide O(log N) guarantees for these cases as well.
>
> So, CHM now omits bit-scrambling (well, almost -- just a single xor to
> avoid losing impact of top bits), and instead rolls over into using
> trees for bins that contain a statistically unlikely number of keys,
> and rolling back into simple bin form if/when they don't due to
> deletions or resizings. ("Statistically unlikely" is a surprisingly
> low number. More than 8 nodes are expected only about once per ten
> million under perfectly random keys; this is also often (depending on
> cost of equals() methods, additional dispatch overhead, etc), around
> the break-even point for using trees vs lists). The updated tree
> mechanics now provide O(log N) worst-case performance in either of two
> cases: (colliding non-identical-hashCodes), and (identical hashCodes
> but mutually Comparable keys). And does so while also speeding up by a
> little those typical usages that do not encounter systematic
> collisions. Also, there is no sense at all in using any form of
> hashSeed in this scheme. It basically behaves the same whether or not
> you use one.
>
> The overall impact appears to be a net win. Non-heavily-colliding
> cases are a bit faster. Some colliding cases are a little slower and
> use more memory (tree-based nodes are bigger than plain ones), but
> have much better worst (and typical) case bounds unless people use
> crazily-bad keys that all have same hashCodes but are never equal or
> comparable to others, in which case treeification is wasted effort
> (but only by a constant factor of about 2 in both time and space).
>
> This seemed to be enough of an improvement in terms of both worst-case
> and expected performance to try applying it to TreeBin-enabled
> HashMap. So I gave it a try. To enable bidirectional tree<->plain node
> conversion, I used the subclassing strategy mentioned during
> discussion of the initial balanced tree version. This in turn requires
> a completely different within-package internal API to allow
> LinkedHashMap to continue to subclass HashMap. A couple of internal
> constructions exist solely to allow the same source code to be
> identical in HashMap and ConcurrenHashMap (but not the same compiled
> code, since the types they operate on are and must be unrelated in the
> two classes, and because they live in different packages, there's no
> nice way to express their commonalities without exposing.)
>
> This note serves as a pre-proposal request for comment, as well a
> request for anyone interested in trying it out, especially in any
> weird/unusual use cases, to report experiences before seriously
> considering it as a replacement. To simplify logistics, I checked in the
> code in a workspace directory in jsr166:
>
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/dl/java/util/HashMap.java?view=log
>
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/dl/java/util/LinkedHashMap.java?view=log
More information about the core-libs-dev
mailing list