Scaling problem of HashMap (introduced with alternative hashing)
Wolfgang Hoschek
whoschek at cloudera.com
Sat Jan 5 22:10:39 UTC 2013
Mike Duigou <mike.duigou at ...> writes:
> > I am responding on the StackOverflow thread. I will look into using
ThreadLocalRandom. > > The random.next() is clearly a potential bottleneck but
given that this happens only once per HashMap > instance it is still unclear why
a reasonable application would want to create hundreds or thousands of > hash
maps per second. > > Mike
There are tons of apps out there that create a transient HashMap per record in
big data applications. Think parsers and serializers in tight loops, for
example. ArrayList and HashMap are probably the most heavily used data
structures in every day java usage, and perf regressions in this area affect all
existing java programs. Putting any synchronization into unsynchronized
collections classes is a real gotcha.
In my opinion, this is unacceptable and needs to be fixed ASAP. The change that
was apparently introduced in 7u6, CR#7118743 should be reverted or fixed without
requiring any synchronization or CAS operation.
Somehow this situation reminds me of the colossal mistake of making StringBuffer
and Vector and HashTable synchronized in JDK 1.1/1.2. People paid dearly for
years for that mistake. No need to repeat that experience.
Indeed, the performance of HashMap constructor is of such high importance that
Josh Bloch (the creator of the java collections framework) back in the day even
put in the optimizations in response to customer feedback to avoid the
multiplication loop for the common case, which is the HashMap() no-arg
constructor:
java6:
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR;
threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
table = new Entry[DEFAULT_INITIAL_CAPACITY];
init();
}
I just checked the source and it appears that this optimization has ALSO somehow
got lost on the way from java6 to java7. In Java7 I find that the no-arg
constructor required the while (capacity < initialCapacity). In other words,
CR#7118743, not only adds CAS and other unnecessary random related computation
overhead that hurts a common usage pattern, but it also removes the deliberate
fast path put in by Josh Bloch back then, which makes this an even bigger
performance regression!
java7:
public HashMap() {
this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
capacity <<= 1;
this.loadFactor = loadFactor;
threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
table = new Entry[capacity];
useAltHashing = sun.misc.VM.isBooted() &&
(capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
init();
}
Wolfgang
More information about the core-libs-dev
mailing list