Faster HashMap implementation

Sat Jun 13 16:45:55 UTC 2009

Sorry for inserting myself into this thread but this has been an
interesting conversation filled with a lot of trade-offs and
optimization techniques for edge cases. It occurred to me that one of
the problems is that the traditional HashMap implementation relies on
inheritance rather than composition. For example, if the hash map
representation where abstracted out from the interface then you could by
default construct a HashMap using the existing approach but
alternatively construct it with another approach. If you further
abstracted the hashing strategy into its own class then you would have
even more flexibility composing a HashMap. Since maps are so often
resized you could even have a dynamic map that starts out with open
addressing but keeps count of collisions and on resize changes to a more
optimal scheme for the current data.

This allows you to have a decent default HashMap but allows clients to
have an optimal hash map if they know they will allows have Long or
String or whatever keys versus doubles, floats, etc. 

With the hashing strategy, an identity map becomes as easy as mixing in
an identity hashing strategy during composition.

Anyway, just a thought, I'm sure it's been considered before but I just
thought I'd throw it out there.

-----Original Message-----
From: core-libs-dev-bounces at openjdk.java.net
[mailto:core-libs-dev-bounces at openjdk.java.net] On Behalf Of Doug Lea
Sent: Saturday, June 13, 2009 5:31 AM
To: Ismael Juma
Cc: core-libs-dev at openjdk.java.net
Subject: Re: Faster HashMap implementation

Ismael Juma wrote:
> Out of curiosity, do you know if any tests have been done with an open
> addressing scheme similar to Python dictionaries[1] in Java? Near the
top,
> there's a comment explaining how it works and the motivation.
> 

More or less. This is roughly similar to the schemes I mentioned
a few days ago, of first direct-addressing. The python
approach (as noted in its comments) has most trouble with
cases that occur too often in Java to ignore: Poor entropy
in low bits (the hashCodes for commonly used Floats and Doubles are
especially problematic (Aside: you wouldn't think these would
ever be used keys, but they are)) and much-greater-than-expected
duplicate codes (mainly the particular value zero). However,
these do seem amenable to one of the hybrid approaches
I mentioned; I'll try to get a full version together soon.

-Doug