HashMap collision speed (regression 7->8)

Martin Buchholz martinrb at google.com
Fri Jan 9 23:29:13 UTC 2015


Given the prevalence of sub-optimal hashcodes, my own intuition is also
that raising the treeification threshold from 8 will be a win.  But it
depends on the element type - the more expensive it is to compare elements,
the lower the optimal treeification threshold.  String (or a wrapper around
Strings) is the most common element type, and comparison of unequal Strings
is usually fast because ... (wait a minute, I thought String.equals would
compare hash codes, but it doesn't!  but even if it did, our determined
adversary has crafted String keys all of the same length and all with the
same hash code and all with a long common prefix ...)  Did I just talk
myself into supporting Doug's choice of 8?  We probably don't want to get
into the business of measuring the cost of equals and compareTo...  Or
maybe we do, given that this overhead only comes into play when we consider
treeification?!  Is HashMap going to grow its very own JIT?!


On Fri, Jan 9, 2015 at 11:51 AM, Doug Lea <dl at cs.oswego.edu> wrote:

> On 01/09/2015 02:22 PM, Bernd Eckenfels wrote:
>
>  My naive suggestion would be to delay the treeifying a bit longer if
>> the hash keys are not Compareable and the hash tables are large(er).
>>
>>
> Feel free to experiment. Bear in mind that the main rationale for
> treeifying HashMap and ConcurrentHashMap is that there are bad guys
> out there who hand-craft millions of keys with identical hashCodes
> to try to bring down servers. The treeified versions successfully
> resist this, at the expense of adding a further penalty to classes
> with already-terrible hashCodes. (One arguable benefit is that people
> who now notice this will be more motivated to fix their hashCodes.)
> The main penalty here vs linear lists for cases with only dozens
> (vs millions) of clashing elements is the call to comparableClassFor
> for non-Strings, which hits some reflective goop.
> I posted something a few years ago mentioning that this could be
> made faster with some other JDK support. But as  Peter Levart noted,
> bootstrapping issues preclude the most straightforward solution.
>
> -Doug
>
>



More information about the core-libs-dev mailing list