Scaling problem of HashMap (introduced with alternative hashing)
Peter Levart
peter.levart at gmail.com
Wed Jan 2 06:58:06 UTC 2013
On 01/02/2013 12:13 AM, Eric McCorkle wrote:
> Was the original purpose security-oriented? It seemed to me that the purpose was to make performance pathologies less likely.
>
> Consider, for example, a bad hash function that maps ip addresses directly to their integer representation. If an application puts entire subnets into a map. All the addresses will go into contiguous buckets, which increases probability of a collision considerably. Adding in randomness helps to smooth over these sorts of pathologies.
Then how do you explain the need to have different (unique?) hashSeed
values for each individual HashMap instance?
The hashSeed is a constant for a particular HashMap instance. It's
XOR-ed with the hashCode just before bit-shifting-and-xor-ing. If any
random int is ok, then why not take the same value for all HashMap
instances?
/**
* Retrieve object hash code and applies a supplemental hash
function to the
* result hash, which defends against poor quality hash functions.
This is
* critical because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits.
*/
final int hash(Object k) {
if (k instanceof String) {
return ((String) k).hash32();
}
int h = hashSeed ^ k.hashCode();
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
Regards, Peter
On Jan 1, 2013, at 7:53 AM, Peter Levart <peter.levart at gmail.com> wrote:
>> Hi and happy new year, 2013!
>>
>> I don't know what the main purpose of "hashSeed" actually is. If it is to fight against predictable hashCode to bucket-index mappings, it does it's job somehow, but not very good. For example, it is very easy to predict the next hashSeed numbers that will be allocated:
>>
>> import sun.misc.Hashing;
>>
>> public class HashSeedCrack {
>> private static final long multiplier = 0x5DEECE66DL;
>> private static final long addend = 0xBL;
>> private static final long mask = (1L << 48) - 1;
>>
>> static class Rnd {
>> private long rnd;
>>
>> Rnd(long seed) { this.rnd = seed; }
>>
>> int nextInt() {
>> rnd = (rnd * multiplier + addend) & mask;
>> return (int) (rnd >>> 16);
>> }
>> }
>>
>> public static void main(String[] args) throws Exception {
>> int r0 = Hashing.randomHashSeed(null);
>> int r1 = Hashing.randomHashSeed(null);
>>
>> long s0 = ((long) r0 << 16) & mask;
>> long s1 = ((long) r1 << 16) & mask;
>> long mx = mask - ((1L << 16) - 1);
>> Rnd rnd = null;
>> for (int i = 0; i < 65536; i++) {
>> long sx = s0 + i;
>> if (((sx * multiplier + addend) & mx) == s1) {
>> rnd = new Rnd(sx);
>> int rx = rnd.nextInt();
>> assert rx == r1;
>> break;
>> }
>> }
>> assert rnd != null;
>>
>> System.out.println("Next HashMap.hashSeed will be: " + rnd.nextInt());
>> System.out.println("... and it is: " + Hashing.randomHashSeed(null));
>> }
>> }
>>
>>
>> For this purpuse, the Hashing.randomHashSeed() should not be publicly visible. If ThreadLocalRandom is to be used instead of a private instance of a java.util.Random(), then it would defeat the "secrecy" since ThreadLocalRandom.current() is a shared per-thread instance accessible to anyone.
>>
>> As far as standard ThreadLocalRandom is concerned, it's usability is impaired by it's design. It would be a better API if a plain ThreadUnsafeRandom with standard constructors was part of standard JDK API. ThreadLocal binding would then be a matter of each particular (possibly private) usage...
>>
>> Regards, Peter
>>
>> On 12/27/2012 08:38 PM, Aleksey Shipilev wrote:
>>> Looks very like dumb inlined java.util.Random?
>>> Is there a security threat to use ThreadLocalRandom instead there?
>>>
>>> -Aleksey.
>>>
>>> On 27.12.2012, at 23:16, Zhong Yu <zhong.j.yu at gmail.com> wrote:
>>>
>>>> Reported by the SO question
>>>>
>>>> http://stackoverflow.com/questions/14010906
>>>>
>>>> the HashMap constructor contains a CAS, which is kind of surprising.
>>>> Could it be removed?
>>>>
>>>> transient final int hashSeed =
>>>> sun.misc.Hashing.randomHashSeed(this); // CAS
>>>>
>>>> Zhong Yu
More information about the core-libs-dev
mailing list