Faster HashMap implementation

Mon Jun 8 16:59:11 UTC 2009

Doug,

On your previous message:

1) I reverted hash function,

2) did some tuniing for MapCheck benchmark -
equals, hashCode, toString, and putAll (if argument have the same type)
now does not use entrySet() iterator which is very expensive.

Very big speedup would be to reuse Entry object,
since in most cases previous one is dropped when next() is called,
but not always - entries can be stores somewhere,
and I do not know how to tell is it possible to reuse Entry or not.
Maybe JVM EscapeAnalysis can somehow help,
if we'll declare all iterator-related methods as final?
Or use some switch like +XX:+ReuseHashMapIteratorEntry :)

Also, in the latest version I made Entry as close
to current HashMap as I could - its setValue method
not updates the map, getValue re-reads the value from array, etc.
Not doing that can improve performance.

3) as for DenseMapMicroBenchmark test - I think that
CompactHashMap.scala will perform well in such tests
since it stores primitive types in primitive arrays,
and it is true my Benchmark.scala on Integers
(sample result is included in ReadMe file).

On Mon, Jun 8, 2009 at 8:07 PM, Doug Lea <dl at cs.oswego.edu> wrote:

> Notice that footprint comparisons with current HashMap
> are load-dependent. ...
> There is a crossover point where yours uses
> less memory, which will often be the case when grown using
> the usual resizing thresholds. Plus GC should be faster in yours.

That is true.

> However, statistically, most HashMaps have very few elements.
> I once instrumented some of the "reference workload" usages
> and found that over half of the HashMaps/Hashtables had
> a max 3 or fewer elements during their lifetimes.
> So on average using your version will likely increase
> application footprints. It seems possible to deal with
> this though. For example, you might consider separately
> allocating the back-half of index array only when needed.

Actually, scala version is optimized for memory footprint:
1) its initial capacity is 4 elements, not 16
2) default load factor is 1, not .75
3) on tiny maps (<= 128 elements) is uses byte index array,
and on medium sized (<32K)  arrays of shorts.

I have an idea how not to use 'deleted' bit
and put 256 element into byte-index-map.

But thus there is no place to store all of hashcode bits,
and there are very slow resizes when switching
from byte to short and from short to int arrays.

Also, in MapWordLoops it seems that original
HashMap is faster on 50-elements test.

Maybe there is a sence to optimize 1-2-3-element cases
by directly storing the values without hashing, don't know.

While it remains approximately true in yours that loadFactors < 1
> are the average number of traversals to find existing key
> near the resize threshold point, probably some more work/thought
> is needed to make a better story about interpreting them internally.

Well... It is also possible to reorganize index bit structure
to allow for example up to 2.0 load factor but I am not sure
if it is really needed.

It strikes me that there might be a bigger win available
> here by using some hybrid strategy based on
> IdentityHashMap-like single-indirect (i.e., usually
> at most a single cache miss) for collision-free accesses
> but using a variant of your clever packed-indices
> under collisions. I might try this out someday.

Interesting, I'm looking forward to see it.

> Iteration over elements is a sequental array read which is faster.
>> To clone such HashMap we only need to clone 2 arrays with is also much
>> faster.
>>
>
> I don't understand why you don't simply traverse the element
> array for HashMap iterators? You cannot do this for your linked
> version, but it would not hurt to make completely different
> iterator types.

I don't get your question. In most cases I really do this like:

        for (int i = 0; i < firstEmptyIndex; i++)
            if (!isEmpty(i)) {
....

or you mean iterateFirst/iterateNext methods?
Their only purpose is to simplify LinkedMap and reuse code,
but maybe having different iterators will give some speedup,
but it's just one virtual call - you think it's really that bad?
But hence it could not be inlined by HotSpot...

You can look up git history - versions prior to LinkedMap
had 'direct' iterators, you can benchmark it if you want.

For non-concurrent collections, the goal of ConcurrentModificationExceptions
> is to help people find/fix incorrect code; not necessarily to only
> blow up if proceeding is impossible.

I was thinking if it would be possible to make concurrent
implementation based on AtomicIntegerArray with CAS/retries,
but there are some problems I cannot solve. Not sure for concurrent
resize - I have some ideas how it could be done, but management
of deleted indices (holes) that can be reused seems impossible
without either a full quasi-synchronization with read locks
that can also be stored as bits in index...

I hope this helps! Despite all the concerns above, this is the
> first alternative version of HashMap that I can recall seeing
> that seems promising enough to continue exploring as a
> possible replacement.
>

Thanks!

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20090608/e668a68a/attachment.html>