HashMap - Hybrid approach
Doug Lea
dl at cs.oswego.edu
Wed Jun 10 12:59:32 UTC 2009
Alex Yakovlev wrote:
> To have single-indirect access we need to read
> key/value objects with the first array read.
Backing up a little, there are five main forces
in hash table design. These are fresh in my mind because
I've been trying to put together CustomConcurrentHashMap,
the last planned JSR166y JDK7 component, that handles
weak/soft refs, custom equality comparisons, eviction, etc.
(Concurrency adds further forces omitted here.)
1. Memory stalls
- Cache misses due to randomization of indexing can easily
cost hundreds of cycles. This is made worse if there are
multiple levels of indirection each with poor locality.
(Multiple levels with good locality are much less of a
problem, which is why ConcurrentHashMap is pretty fast
despite adding one). And this is made even worse when
there are too many computations before processor can even
do a prefetch, which is one reason why bit-spreading functions
require careful speed vs uniformity tradeoff analysis.
2. Method Dispatching
- Both Object.equals and Object.hashCode are often
"megamorphic", i.e., cannot be inlined because they
are overridden in too many classes. In some usages,
this accounts for the main performance bottleneck.
So caching hashcodes, which usually also allows skipping
calls to equals for different objects, has a noticeable effect.
Also, it is sometimes possible to bias the structure of code
to encourage compilers to do more per-call inlining,
which helps a lot. Note that these effects can be difficult
to test. Test programs that include multiple key types
(like our DenseMapMicrobenchmark) are a little more
accurate indicators of megamorphism effects, but
even here, these kinds of microbenchmarks will often
cause compilers to do deeper inlining than they would in
most actual usages.
3. Collision handling
- Collisions lead to some sort of slower scanning. Especially
given the highly variable quality of user-supplied
hashCodes, you'd like to localize the impact of collisions
to avoid per-map slowdowns. Linear-probes and related techniques
don't fare very well on these grounds.
4. Biasing for expected operation mixes
- I once checked that collection (and in particular HashMap/Hashtable)
usage approximately obeys the usual 4:1 read vs write ratio
seen in just about every aspect of computing. In fact it seems
a bit more extreme -- something like 84% read (get() or iterate)
15% add (put() and variants) and 1% delete (remove, clear). So
making reads fast is clearly the highest priority.
5. Footprint
- Many Java frameworks (especially EE) can create hundreds of
thousands of collections, many of them very small, but
also some huge ones. This leads to second-order bloat
effects, like high TLB miss rates, and much slower GC. For
some details see papers by Gary Sevitsky and colleagues at
http://domino.research.ibm.com/comm/research_people.nsf/pages/sevitsky.pubs.html
This is a difficult issue with hash tables since the whole
idea is to preallocate space to allow random indexing. And
because resizing is not cheap, you don't want to be overly
incremental about it.
>
> And we cannot store collisons in pre-allocated
> arrays - to read it from the same array with keys/values
> it needs to be an object. Thus writing speed on
> my original map will always be faster.
>
Right. My thought here is to use an alternative structure
for collisions. So, given capacity C, you'd have:
elements[] -- 2*C of key/value
hashes[] -- 1*C of hash + bookkeeping bits
collisions -- a dynamic bitwise trie or somesuch
Compared to current HashMap...
+ It allows parallel prefetches on get; of both
elements[2*h] and hashes[h], so should reduce memory stalls.
- It triples the unloaded footprint. You can get most of
this back in default-constructor usages by not allocating
arrays, but instead using only collisions tree until size
reaches say 4, and then using a larger (say 64) initial capacity
once you know you aren't a tiny map.
= Collisions may or may not be a little more expensive to resolve
- Iterators are likely slower and definitely a lot messier
- The trie nodes needed here would require 2 or 3 more
pointer fields than do list-based nodes, so the incremental
footprint grows faster.
- Resizing is slower and more complicated since the collisions
may or may not be moved to main table.
Without fleshing this out, I'm not sure whether this hits the right
tradeoffs.
-Doug
More information about the core-libs-dev
mailing list