HashMap collision speed (regression 7->8)

Mon Jan 12 12:15:26 UTC 2015

On 01/12/2015 03:26 AM, Peter Levart wrote:
> Did bit smearing function used in JDK 7 have negative impact on any hashCodes or
> was it replaced with simpler one just because it is not needed when tree bins
> are used?

I don't have detailed records, but in pre-release jdk8 tests, smearing was
detectably but not hugely slower for the cases of String, identity-hash,
and Integer keys. It is worth rechecking with jdk9. We believe these
cases together account for the majority of HashMap instances --  based on
some old instrumentation of a reference load, around 90% of the them.
(It would be great if someone re-ran this on a more recent corpus.)
The design goal of HashMap is to work best in common cases,
and to work well in all others except pathological cases of
non-comparable elements with many duplicate hashCodes.

One issue that might be causing treeified bins to be worse than they
would otherwise be is that the recursion in TreeBin.find makes
it harder for compilers (JITs) to perform class analysis to determine
concrete types, leading to more virtual dispatching and slower code.
Someone from a compiler group once asked whether this code could
be changed to help JITs. Allocating emulated stack-frames
during lookups seems not to work well, but it is possible that
a scheme that temporarily reused internal node pointers might help.
I haven't looked into this.

-Doug