HashMap - Hybrid approach

Thu Jun 11 08:49:51 UTC 2009

Doug,

I thought about tree-like structures for collisions,
but what came to my mind is that the problem
is not in search depth that can be reduced by
non-linear structures, but in memory fragmentation.

But in my array-based approach it is possible
to defragment it and (almost) always store all collisions
in sequental array indices.

Here is the code for such approach:
http://github.com/alex14n/CompactHashMap/tree/defrag

Now it do defragmentation only on put operations.
Maybe there is a sence to do some reorder in resize too,
when splitting big baskets. That leaves about ~1%
fragmentation but data is still very close Maybe there is
also some sence in some power of 2 index aligning
to better fit/prefetch in CPU caches.

But tests shows that this does not give much performance.
In my tests I see ~5% increase on misses,
and no significant changes on successfull reads.

And writing speed decrease is significant. It can be
partially solved be defragmenting only large baskets,
leaving small (<= 2 or even 3 elements) as is.
Now defragmentation happens in ~40% of puts.
But it was done to rest read speed so it's better
to have data not fragmented at all.

I gathered some statisics, and in successful reads
when map is almost full with .75 load factor just before resize,
70% of gets happens on first try, 23% on second,
5% on third lookup, 1% on fourth.

Misses are resolved on first try (empty basket) in 47% cases,
in 35% cases hashcode/key are compared, in 13% of cases
a second lookup happens, third in 3% cases.

This distribution is almost the same on all data I tested.
So defragmentation helps only in ~7% of successful reads
and ~18% of misses.

Data structures that are fragmented will probably perform worse,
here goes any objects with links to each other.

Maybe there is a sence in some kind of binary search
in the same array. When we look at first key and its hashcode
we can set bit flags if there are keys with hashcode greater
than that or less than that. Or even have 2 next indices
one will branch to greater hashcodes, another to less or equals.
Still with defragmentation for better localization in memory.
Not sure about balansing such trees. And actually not sure
if all of this is needed since sequental memory reading
of collisions is not expensive at all.

Alex

On Wed, Jun 10, 2009 at 4:59 PM, Doug Lea<dl at cs.oswego.edu> wrote:

> 1. Memory stalls
>   - Cache misses due to randomization of indexing can easily
>     cost hundreds of cycles. This is made worse if there are
>     multiple levels of indirection each with poor locality.
>     (Multiple levels with good locality are much less of a
>     problem, which is why ConcurrentHashMap is pretty fast
>     despite adding one). And this is made even worse when
>     there are too many computations before processor can even
>     do a prefetch, which is one reason why bit-spreading functions
>     require careful speed vs uniformity tradeoff analysis.

> 3. Collision handling
>   - Collisions lead to some sort of slower scanning. Especially
>     given the highly variable quality of user-supplied
>     hashCodes, you'd like to localize the impact of collisions
>     to avoid per-map slowdowns. Linear-probes and related techniques
>     don't fare very well on these grounds.

> 4. Biasing for expected operation mixes
>   - I once checked that collection (and in particular HashMap/Hashtable)
>     usage approximately obeys the usual 4:1 read vs write ratio
>     seen in just about every aspect of computing. In fact it seems
>     a bit more extreme -- something like 84% read (get() or iterate)
>     15% add (put() and variants) and 1% delete (remove, clear). So
>     making reads fast is clearly the highest priority.

>> And we cannot store collisons in pre-allocated
>> arrays - to read it from the same array with keys/values
>> it needs to be an object. Thus writing speed on
>> my original map will always be faster.

> Right. My thought here is to use an alternative structure
> for collisions.

>  + It allows parallel prefetches on get; of both
>    elements[2*h] and hashes[h], so should reduce memory stalls.

>  = Collisions may or may not be a little more expensive to resolve
>  - Iterators are likely slower and definitely a lot messier
>  - The trie nodes needed here would require 2 or 3 more
>    pointer fields than do list-based nodes, so the incremental
>    footprint grows faster.