Faster HashMap implementation

Mon Jun 8 16:07:05 UTC 2009

A few notes on your implementation...

Alex Yakovlev wrote:
> 
> Main trick is in storing all data in 2 arrays without any HashEntry objects.
> One is array of Objects with keys and values, another in a complex array 
> of indices.
> 

Notice that footprint comparisons with current HashMap
are load-dependent. Let C be capacity and N be # items.
Assume 32bit pointers.
Yours requires about 4*C storage versus 1*C + 5N for HashMap.
  4*C = key + value + main part of index + overflow part of index,
       although a little less with loadFactor < 1 (see below)
  5*N = 4 fields per entry plus 1 word Object header.
There is a crossover point where yours uses
less memory, which will often be the case when grown using
the usual resizing thresholds. Plus GC should be faster in yours.
However, statistically, most HashMaps have very few elements.
I once instrumented some of the "reference workload" usages
and found that over half of the HashMaps/Hashtables had
a max 3 or fewer elements during their lifetimes.
So on average using your version will likely increase
application footprints. It seems possible to deal with
this though. For example, you might consider separately
allocating the back-half of index array only when needed.

The unloaded overhead is worse for your Linked version,
although that is probably less of a concern.

It is very difficult to empirically compare footprints
in the most interesting cases, of using a bunch of
HashMaps of various sizes, in part because GC may or may not
actually collect all collectable garbage.

> Now I just force load factor to 1 if argument is greater than 1.

The effects of loadFactors are pretty much incommensurate
across versions. Because you trade element-space for index-space,
varying loadFactor have little bearing on total space. (But still
a little, since an index costs half of a key+value). While
it remains approximately true in yours that loadFactors < 1
are the average number of traversals to find existing key
near the resize threshold point, probably some more work/thought
is needed to make a better story about interpreting them internally.

> 
> When looking for missing key my implementation can be twice faster in 
> some situations,

Right. In the tests I ran, yours is generally faster for
failed lookups. It is also sometimes faster for successful
lookups for some key types, distributions, and/or machines
I test on. (For some reason, more often better on AMDs
than Intels or Sparcs).

It strikes me that there might be a bigger win available
here by using some hybrid strategy based on
IdentityHashMap-like single-indirect (i.e., usually
at most a single cache miss) for collision-free accesses
but using a variant of your clever packed-indices
under collisions. I might try this out someday.

> 
> Iteration over elements is a sequental array read which is faster.
> To clone such HashMap we only need to clone 2 arrays with is also much 
> faster.
> 

I don't understand why you don't simply traverse the element
array for HashMap iterators? You cannot do this for your linked
version, but it would not hurt to make completely different
iterator types.

> Thus iteration
> order is more predictable and I think it's possible not to throw 
> ConcurrentModificationException
> in most situations. 

For non-concurrent collections, the goal of ConcurrentModificationExceptions
is to help people find/fix incorrect code; not necessarily to only
blow up if proceeding is impossible.

I hope this helps! Despite all the concerns above, this is the
first alternative version of HashMap that I can recall seeing
that seems promising enough to continue exploring as a
possible replacement.

-Doug