Faster HashMap implementation

Thu Jun 4 12:32:10 UTC 2009

Alex Yakovlev wrote:
> Hello,
> 
> While playing with scala programming language I've come to a HashMap 
> implementation
> that appears to be faster than current java.util.HashMap, also with a 
> lower memory footprint.
> 

This is a promising approach. The indirection using the
index array makes for a better time/space tradeoff than
current HashMap for many usages.

There are however a couple of constraints on HashMap
that have made it difficult to replace it with
overhauled internals, despite a few explorations in
doing so.

The main one is that LinkedHashMap is declared as a
subclass of HashMap. There's not
an obvious way to add insertion- or access- ordered links
to your version. This still might not be a huge obstacle,
since with some care, and some wastage, LinkedHashMap
could ignore its superclass implementation and re-establish
current approach.

Also, the meaning/impact of load-factor parameters differs
in your version. It seems possible to reinterpret these internally
to provide approximately equivalent statistical properties.
(For example a loadFactor argument of 2.0 might translate into
internal one of around 0.9.)

Also, the time/space impact of per-node Entry allocation in
entrySet iterators will be noticeable to some applications.
This is likely not a big concern though.

Across such issues, it would take some further work to
make a strong case for actually replacing HashMap.

We once faced similar issues for IdentityHashMap, which doesn't
bear any class relationship to HashMap, doesn't expose
load factors, and is even more compact than yours since it
doesn't cache hashCodes. Not caching hashCodes,
and so using IdentityHashMap-style table would also be
a good idea for many common keys like Strings,
numerical keys, and key classes that don't override
hashCode, but users have come to expect that HashMap
will not recompute hashCodes after insertion. If we
had reified generics, we might be able to do something
about this but otherwise there seems to be no good
way to adapt internal algorithms to key types. Similarly
for values.

-Doug