Faster HashMap implementation

Sat Jun 27 03:54:12 UTC 2009

Also it should be mentioned that memory-wise my implementation
uses more memory in resize: it need to copy 2 arrays of ~2*C size
(C = capacity) and current HashMap needs only one array of C size,
data in Entry structures are not copied. This can be addressed
if it is important by (1) splitting integer arrays into main
and overflow parts, (2) some splitting of key/value array...

As for huge maps - I was thinking how to solve this issue better.

Using additional arrays does not looks like a solution
mainly because now 2 bit are reserved for control flags
which leaves only up to 30 bits for real index value.

Rather simple way is to have a secondary map where to put
elements when main map is full. But performance will be poor -
there'll be two (or number of such maps) lookups for each get/put.

We could use javolution-like approach with 'map of maps', i.e
using some lower bits to choose submaps. But it is an extra
indirection step, thus worse performance.

Another way is not to use integer array at all when capacity
is at maximum, and use array of objects as in current HashMap
to store HashEntries. Maybe this is a good way, but code
complexity will increase - basically there will be two
implementations, array-bases and entry-bases, just mixed
in once class to reduce virtual calls. (Or just to proxy all calls
to another class when capacity is at maximum).

Or to somehow mix entry and array based storage,
like adding another bit flag to look in dynamic storage,
and in key/value array set key to some reserved object
that will signal that value is not a real value but an Entry-like
dynamic structure. But on huge maps this would be
a waste of memory - it would use 3 words (integer index,
reserved key value and real pointer) just to address
an Entry object.

Do you see better way to do that?

OTOH, such a huge maps are probably used on 64-bit JVM,
where array-based approach uses a lot less memory so finding
a way to use pure arrays would be a big win (keeping resize problem
in mind - which is also solved by javolution-like approach).

On Tue, Jun 23, 2009 at 8:04 PM, Doug Lea<dl at cs.oswego.edu> wrote:
>
> But I did want to point out a problem shared with
> all purely table-based approaches, including yours.
> Assuming power of 2 capacities, you can only hold up to 1 << 29
> entries, because max power of 2 array length is 1 << 30.
> This holds even in -d64 mode.
> Current HashMap works up through size == Integer.MAX_VALUE.
> (And even "works" if size overflows so long as you don't
> call size().) I know that there are some extremely
> large hash maps out there. Maybe some would newly
> encounters failures.
>
> There are ways out of this, using some sort
> of dynamic structures for overflow.