HashMap - Hybrid approach

Thu Jun 11 14:53:08 UTC 2009

Alex Yakovlev wrote:
> I thought about tree-like structures for collisions, but what came to my mind
> is that the problem is not in search depth that can be reduced by non-linear
> structures, but in memory fragmentation.
> 

There were three reasons I suggested tries for collisions (1)
to remove a dependent load, allowing prefetch into elements
array (2) to decrease footprint, (3) to conform to current
threshold parameters, while still avoiding long sequential
pile-ups that you can get using using open-addressing-style
collision handling, especially if the number of duplicate hash
codes is higher than statistically expected.  It may be possible
to still get most of the effects without them though.
Here are some notes on some possible ways to get there:

First a digression: One source of hash duplicates is that
System.identityHashCode (which you also get as
Object.hashCode for keys that don't override it) has had from 21
to 25 significant bits in various releases of hotspot (and fewer
in some other JVMs), which means you start seeing more and more
of them in very large tables. It is a known problem that
IdentityHashMaps may have crummy performance for tables past a
few million elements. On the other hand, because
identityHashCodes are nicely randomized by the JVM, and because
IdentityHashMaps avoid an indirection compared to other maps,
performance on less-than-huge tables is by far faster than for
all others.

About footprint (again; refining/correcting my previous mail):
In 32bit mode, under default 3/4 load factor, the current
Hashmap allocates C+6N words (6 = 4 fields plus 2 words object
header).  versus 13C/4 for yours. In steady state, C = 4N/3 just
before a resize, and 8N/3 just after, with average at 6N/3 = 2N.
So the steady state footprints below look good, but the initial
sizes under default 16 capacity don't. (While I'm at it, for
further comparison, the footprint for IdentityHashMap is also
below.  It hardwires 2/3 not 3/4 threshold.)

                  steady state       initial
              min     ave     max    (default cap)
yours:      4.33N   6.50N   8.67N    52
current:    7.33N   8.00N   8.67N    16
(Identity:  3.00N   4.50N   6.00N    32)

To cope with known bloat issues, any possible HashMap
replacement should have a *smaller* footprint for unused and
tiny maps. Using a dynamic collision tree is one way to reduce
initial footprint.  Another way is to just to be more cautious
on startup:
   * Lazily create all arrays
   * Separate the front and back of index array. (Note that
     you don't have any locality here to lose by doing so,
     since you jump to index h+hashlen.)
   * Use default initial capacity of 4, but jump to 32 or 64
     on first collision.
This would have overhead 0 (plus fixed fields) for unused maps
and 12 for most tiny maps.

Also, currently you initially allocate 64 words for a
1.0 loadfactor, which is surprisingly greater than 52 for 0.75
default. Under above scheme initial values would be closer.
Further note that the steady state space differences between
0.75 and 1.0 are very small (6.25%):
                  steady state       initial
              min     ave     max    (default cap)
lf=1.0:     4.00N   6.00N   8.00N    64

And further, the performance differences are very small across
these load factors in a few tests I ran.  This suggests a
different sort of tactic for helping with indirections and
prefetches.  Suppose that index[h] was usually just h; that is,
with the key in elements[2*h]. (This only works in current
design if loadFactor==1, but can work with smaller values if
elements array is sized at hashlen.)  Under this scheme, you can
optimistically read this element at the same time as the
index. To make this work, you'd need a bitmap of available slots
rather than a freelist, adding C/32 space overhead, plus some
heuristic juggling around inside put and resize.  It might be
worth an experiment or two.  Doing this requires yet further
work/thought about how sizing varies with loadFactor
arguments.

> But in my array-based approach it is possible to defragment it and (almost)
> always store all collisions in sequental array indices.

In current HashMap, the nodes chained inside bins tend to have
good locality after a GC because GC will usually relocate them
to be together. Analogously, it would seem to make sense
to defrag only on resize, perhaps accompanied by heuristic
one-place swaps during put.

> I gathered some statisics, and in successful reads when map is almost full
> with .75 load factor just before resize, 70% of gets happens on first try,
> 23% on second, 5% on third lookup, 1% on fourth.

The exact values depend of course on hash quality, insertion
order, etc. But this also reflects fact that the collisions array
(or back half of current index array) is typically less than 30%
occupied. This suggests shrinking its default size and letting
it independently resize, using the same sort of approach as
with main hash: start out with say 25% size, by letting sets of
4 indices share an overflow slot, and expand as needed. Or
maybe start with 50%, which will still be enough footprint savings
to bother.

-Doug