Faster HashMap implementation

Wed Jun 3 23:21:51 UTC 2009

Hello,

While playing with scala programming language I've come to a HashMap
implementation
that appears to be faster than current java.util.HashMap, also with a lower
memory footprint.

Alex Miller advised me to post about it in this maillist.

I've put current implementation (both scala and java) and some benchmark
results at github:
http://github.com/alex14n/CompactHashMap

Current java version implements most of the external HashMap behaviour
inluding serialization, but does not throw ConcurrentModificationException
and does not implement internal methods that can be requeired for some
subclasses.

Main trick is in storing all data in 2 arrays without any HashEntry objects.
One is array of Objects with keys and values, another in a complex array of
indices.

Index array has 2 parts. First part contains hashcode-to-index mappings,
second part is index-to-index mapping for elements with the same hashcode.
The rest bits are used for:
1. 'Deleted' flag to mark empty spots after key removal
(other bits part of such values is used as deleted-linked-list indices)
2. 'End-of-list' flag allowing not to access next index if this element is
last.
3. Some bits of hashcode used to compare with hashcode of the key we are
looking for.

Finding an existing value for a key has approximately the same speed as the
currrent
HashMap implementation. It is:
1. Computing element hashcode,
2. Accessing array to get index/address for this hashcode
3a. Current HashMap access HashEntry object with key/hashcode/value/next
properties
4a. In my implementation hashcode and next are packed in index array bits,
key and value are stored in adjacent cells of the second array.

When looking for missing key my implementation can be twice faster in some
situations,
since it's only one random memory access (element index array). If it's
empty,
or its hashcode bits are not equal to hashcode bits of the key being
searched,
and it's `end-of-list` bit is set - it is enough to stop searching. Current
HashMap
needs 2 random memory reads: (1) array cell with address and (2) HashEntry
at that address.

Insertion of a new key does not require any memory allocation (new HashEntry
object)
since all arrays are pre-allocated except for resize operation. Resize
consists of
Arrays.copyOf to a twice large array and almost sequental rehashing -
reading one array
and writing it to either the same index or with `1` in highest bit.

When adding several key/values they are sequentally written to key/values
array,
and sequental memory read is faster and more cache-friendly.

Having 2 arrays instead of a lot of HashEntry objects is more garbage
collector
friendly and has significantly lower memory footprint.

Iteration over elements is a sequental array read which is faster.
To clone such HashMap we only need to clone 2 arrays with is also much
faster.

Elements order is preserved even after resize, only when some key is removed
it leaves an empty 'hole' that will be filled after next key insertion. Thus
iteration
order is more predictable and I think it's possible not to throw
ConcurrentModificationException
in most situations. Iterator needs only 1 parameter - array index - and even
resize
does not break iteration order.

But there can be problems. For example if we iterated over some key
that was deleted after that and then inserted again into higher array index,
it can eventually came up second time into iterator. But if we save highest
non-empty index that was at iteration start, and monitor insert/delete
operations (possible holes) in indices we've not iterated over yet,
we will not need to throw ConcurrentModificationException in most cases.

I've done some benchmarks that proves my assumptions:
* Reading existing key-value mapping has the same speed,
* Adding new mapping and looking for missing key is significantly faster.

But there can be some problems in my testing strategy
or benchmark can be somehow biased, so it's better to retest it
with some independent benchmarks.

Hope it will help to make java collections faster since such approach
(preallocated arrays instead of ...Entry objects) can be used to speedup
other collections. In my scala version both HashMap and HashSet
are implemented this way. It also stores primitive types like int
in primitive arrays which saves a lot of memory. But on one hand
scala has built-in boxed arrays, and on the other hand such array boxing
is slow. It also has to store keys and values in different arrays,
which gives extra random memory read and thus slower -
when keys and values are in adjacent cells of the same array
value will usually be pre-fetched when key is read.

Best,
Alex Yakovlev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20090604/9e14142e/attachment.html>