Possible HashMap update

Wed Aug 7 23:40:08 UTC 2013

Hi, Doug

I like this approach.

The subclassing used for nodes/entries and trees looks good, given the 
constraints.  The before/after references that HashMap.TreeNode inherits 
from LinkedHashMap.Entry are superfluous when trees are used by a 
HashMap, but IMO we can live with that as trees in general are the 
uncommon case.  And on the other hand, it's great that we can skip the 
instanceof check for put()/get() when the key is the first/only entry in 
a bin.

"Untreeify" and MIN_TREEIFY_CAPACITY are nice additions.

I patched jdk8/tl, and it tests out OK with the jdk_util jtreg tests.

I'm in favor of moving forward with this.

Thanks,
-Brent

On 7/3/13 3:55 PM, Doug Lea wrote:
>
> A few weeks ago, in the course of doing a (hopefully-last) reworking of
> JDK8 ConcurrentHashMap (CHM) internals based on experience with
> preview versions, I noticed that the balanced tree scheme now in place
> as a fallback in the case of usages with many keys all with the same
> hashCode could also be profitably used in other cases: CHM (like
> HashMap) previously performed some bit-scrambling of user hashCodes to
> make it less likely that user hashCodes that were non-identical but
> highly-non-random will collide in the same bins (thus with O(n) vs the
> expected O(1) performance).  This bit-scrambling is an
> anti-optimization in many usages that do not contain the kinds of
> systematic hashCode bias that most hurt performance.  While it is
> reasonably fast, it still places computation where you do not
> want it: in front of the likely cache-miss to access an entry. Plus,
> bit-scrambling is only statistically likely to help.  It is still
> possible to encounter hashCodes that systematically collide. It would
> be nice to provide O(log N) guarantees for these cases as well.
>
> So, CHM now omits bit-scrambling (well, almost -- just a single xor to
> avoid losing impact of top bits), and instead rolls over into using
> trees for bins that contain a statistically unlikely number of keys,
> and rolling back into simple bin form if/when they don't due to
> deletions or resizings. ("Statistically unlikely" is a surprisingly
> low number. More than 8 nodes are expected only about once per ten
> million under perfectly random keys; this is also often (depending on
> cost of equals() methods, additional dispatch overhead, etc), around
> the break-even point for using trees vs lists).  The updated tree
> mechanics now provide O(log N) worst-case performance in either of two
> cases: (colliding non-identical-hashCodes), and (identical hashCodes
> but mutually Comparable keys). And does so while also speeding up by a
> little those typical usages that do not encounter systematic
> collisions.  Also, there is no sense at all in using any form of
> hashSeed in this scheme. It basically behaves the same whether or not
> you use one.
>
> The overall impact appears to be a net win.  Non-heavily-colliding
> cases are a bit faster. Some colliding cases are a little slower and
> use more memory (tree-based nodes are bigger than plain ones), but
> have much better worst (and typical) case bounds unless people use
> crazily-bad keys that all have same hashCodes but are never equal or
> comparable to others, in which case treeification is wasted effort
> (but only by a constant factor of about 2 in both time and space).
>
> This seemed to be enough of an improvement in terms of both worst-case
> and expected performance to try applying it to TreeBin-enabled
> HashMap. So I gave it a try. To enable bidirectional tree<->plain node
> conversion, I used the subclassing strategy mentioned during
> discussion of the initial balanced tree version. This in turn requires
> a completely different within-package internal API to allow
> LinkedHashMap to continue to subclass HashMap.  A couple of internal
> constructions exist solely to allow the same source code to be
> identical in HashMap and ConcurrenHashMap (but not the same compiled
> code, since the types they operate on are and must be unrelated in the
> two classes, and because they live in different packages, there's no
> nice way to express their commonalities without exposing.)
>
> This note serves as a pre-proposal request for comment, as well a
> request for anyone interested in trying it out, especially in any
> weird/unusual use cases, to report experiences before seriously
> considering it as a replacement. To simplify logistics, I checked in the
> code in a workspace directory in jsr166:
>
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/dl/java/util/HashMap.java?view=log
>
>
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/dl/java/util/LinkedHashMap.java?view=log
>