Scaling problem of HashMap (introduced with alternative hashing)
Peter Levart
peter.levart at gmail.com
Wed Jan 2 15:50:08 UTC 2013
On 01/02/2013 10:20 AM, Alan Bateman wrote:
> On 01/01/2013 23:13, Eric McCorkle wrote:
>> Was the original purpose security-oriented? It seemed to me that the
>> purpose was to make performance pathologies less likely.
>>
>> Consider, for example, a bad hash function that maps ip addresses
>> directly to their integer representation. If an application puts
>> entire subnets into a map. All the addresses will go into contiguous
>> buckets, which increases probability of a collision considerably.
>> Adding in randomness helps to smooth over these sorts of pathologies.
>>
> It was related to collisions, particularly where the keys are Strings
> and coming from an untrusted source. There are better solutions for
> handling a huge number of collisions now and we need to change this
> code (ie: get rid of the alternative hashing). For jdk7u then we have
> to be more cautious and I think Mike is going to see about changing it
> to use ThreadLocalRandom.
>
> -Alan.
>
String keys are treated specially in HashMap (using String.hash32()) and
don't need HashMap.hashSeed value at all. hashSeed is only used for
non-String keys. The question is why it has to be a unique value for
each individual HashMap instance? String.hash32() uses a once-per-JVM
allocated String.HASHING_SEED for example...
Regards, Peter
More information about the core-libs-dev
mailing list