Faster HashMap implementation

Thu Jul 2 14:09:10 UTC 2009

> Doug,
>
> Checking key reference equality before hashcode looks like a big win.

This effect is somewhat stronger in tests that keep looking for ==
keys, but in general, successful gets are usually for identical
keys. Except for autoboxed Integers etc.

(Aside: one way to deal with fact that we can better handle
autoboxed cases if we know them up frot would be to create
a CheckedHashMap class (in the spirit of Collections.checkedMap
and use specialized forms for Class agurments corresponding
to boxed types. Plus String while we are at it. Unfortunately
this does nothing for the millions of existing uses
of plain HashMap.)

>
> Have you considered using balancing tree instead of bit trie?
> This could decrease look depth from number of hash bits to log2(tree
> size).

Yes, but bitwise are generally faster on average in this usage
and worst case is around the same for large trees, which are
when they are used here.

>
> Also, there could be a sence in using adjacent table cell for key/value if
> it's
> not used. Maybe also without a hashcode check for less cache misses.
> In practice this gives ~4-5% performance for get(). In theory, when
> 70% of gets are resolved at first try, 23% on second, assuming that
> adjacent cell is empty in 50% cases, and in best case (referential
> equality)
> we'll have 1 cache miss instead of 3, thus .23*.5*(2/3) = 7.67%
> in worst case (2 cache misses instead of 4 with hashcode check) +5.75%

There is a delicate balancing act here. Decent performance of
linear probing relies on good key randomization. Otherwise it
tends to fall apart badly (and so for example the hit rate falls
much lower than 70%). One alternative that looks more
promising is to alternate linear and random probes. More on that
when I get a chance tp further flesh out.

-Doug