RFR (XS) CR 8058643: (str) Re-examine hashCode implementation

Wed Sep 24 20:25:41 UTC 2014

On 09/22/2014 04:29 PM, Aleksey Shipilev wrote:
>
> Fixed:
>   http://cr.openjdk.java.net/~shade/8058643/webrev.02/
>   http://cr.openjdk.java.net/~shade/8058643/8058643.changeset
>
> Since there is no performance impact for this touchup (char -> int
> promotion is happening anyway), and no functionality changes (char ->
> int promotion zero-extends), I haven't developed any new tests, or
> re-spinned any existing ones.
>
> Thanks,
> -Aleksey.
>

Hi,

String.hashCode() caches the result, so repeatable call to same String 
instance is faster for 2nd and further invocations. But the computation 
of hash code itself could be accelerated for a factor of 2 or more on 
todays CPUs. How? By parallelizing it. And I don't mean computing it in 
multiple threads.

Here is a surprising result of a simple benchmark which computes hash 
code of a 128 character string in 6 different ways:

Benchmark                    Mode   Samples        Score  Score error    
Units
j.t.HashBench.hashCode      thrpt         8  8420103.795 162447.069    ops/s
j.t.HashBench.hashCode0     thrpt         8  8439058.660 2842.755    ops/s
j.t.HashBench.hashCode1     thrpt         8 13809510.573 337888.132    ops/s
j.t.HashBench.hashCode2     thrpt         8 15543687.568 716152.160    ops/s
j.t.HashBench.hashCode3     thrpt         8 18173224.431 49410.256    ops/s
j.t.HashBench.hashCode3x    thrpt         8  8543020.232 18158.686    ops/s

Source:

http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench.java

The "hashCode" benchmark is invoking .hashCode() on a new String using a 
'master' string as a constructor argument, so new string is constructed 
with shared char[] array, but with no pre-computed hashCode.

Compare it to "hashCode0" benchmark which replicates String's hasCode() 
implementation and has a comparable score. This is to set the baseline.

"hashCode1" benchmark is already quite faster.
"hashCode2" is faster than "hashCode1"
"hashCode3" is faster than "hashCode2"
... i haven't tried it further. It should stop somewhere, dependent on 
the CPU of course.

To illustrate this has nothing to do with manual vs. automatic loop 
unrolling (as it might appear by 1st glance of the code), "hashCode3x" 
benchmark is equivalent to "hashCode3" regarding loop structure, but 
it's result is comparable to "hashCode" and "hashCode0" benchmarks.

It appears that "hashCode1,2,3" are utilizing CPU's pipeline which can 
perform multiple (2, 3, 4) multiplications in parallel, whereas 
"hashCode0,3x" can't, since each iteration of loop computes the next 
result from the result of previous iteration.

Regards, Peter