Update on alternative hashing for String keys with hash-based Maps
Doug Lea
dl at cs.oswego.edu
Sat Jul 14 06:43:57 PDT 2012
On 07/13/12 19:09, Mike Duigou wrote:
> Doug Lea has been investigating further improvements
A quick overview: The preview version of ConcurrentHashMap
(currently as jsr166e.ConcurrentHashMapV8 -- see
http://gee.cs.oswego.edu/dl/concurrency-interest/index.html)
addresses string hash problems from a different angle. Rather than
replacing the hash code algorithm, it uses O(log n) techniques
to provide graceful degradation in cases where many keys have
colliding hash codes.
The basic techniques used apply only when the colliding
keys are Comparable, which includes all cases of interest
(String, numbers, etc; this is dynamically checked, so the
nominal key types don't matter). They reduce worst-case impact
of dumb/hostile usages where, say, a billion keys all have the
same hash code, from a billion steps per operation down to
about 50 steps without the need for alternate hash functions
(like murmur) that would be less susceptible to these problems
at the expense of noticeably higher costs in the normal case.
It looks possible but will take a bunch of effort to apply
these techniques to the other JDK hash-based classes that
are susceptible to these kinds of performance issues. (The main
downside in doing this is that each of them will require
a few hundred lines of slightly different specializations of
the required variant form of balanced trees.)
This approach may eliminate the motivation/need for "hash2"
slot introduced in class String, further reducing footprints
(i.e., murmur would still be supported but its results not cached).
And further, the ALT-HASHING controls may turn into no-ops for all
JDK tables.
-Doug
More information about the jdk7u-dev
mailing list