Scaling problem of HashMap (introduced with alternative hashing)

Wed Jan 2 15:50:08 UTC 2013

On 01/02/2013 10:20 AM, Alan Bateman wrote:
> On 01/01/2013 23:13, Eric McCorkle wrote:
>> Was the original purpose security-oriented?  It seemed to me that the 
>> purpose was to make performance pathologies less likely.
>>
>> Consider, for example, a bad hash function that maps ip addresses 
>> directly to their integer representation.  If an application puts 
>> entire subnets into a map.  All the addresses will go into contiguous 
>> buckets, which increases probability of a collision considerably.  
>> Adding in randomness helps to smooth over these sorts of pathologies.
>>
> It was related to collisions, particularly where the keys are Strings 
> and coming from an untrusted source. There are better solutions for 
> handling a huge number of collisions now and we need to change this 
> code (ie: get rid of the alternative hashing). For jdk7u then we have 
> to be more cautious and I think Mike is going to see about changing it 
> to use ThreadLocalRandom.
>
> -Alan.
>
String keys are treated specially in HashMap (using String.hash32()) and 
don't need HashMap.hashSeed value at all. hashSeed is only used for 
non-String keys. The question is why it has to be a unique value for 
each individual HashMap instance? String.hash32() uses a once-per-JVM 
allocated String.HASHING_SEED for example...

Regards, Peter