Benefit from computing String Hash at compile time?

Osvaldo Pinali Doederlein opinali at gmail.com
Sun Dec 20 13:32:38 PST 2009


Em 20/12/2009 17:25, Mark Thornton escreveu:
> Osvaldo Pinali Doederlein wrote:
>> attached in the end) - to no avail. I believe one significant problem 
>> is that the hashing function is not friendly to optimization; 
>> something like str[0] ^ str[1] ^ str[2]... would be much better 
>> (i.e., any function that allows loop unrolling and SIMD tricks). But, 
>> as we've already discussed, String.hashCode()'s algorithm cannot be 
>> changed so it's pointless considering this.
>
> from JavaDoc: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
>
> = s[0]*31^(n-1) + s[2]*31^(n-3) + ... +
>  s[1]*31^(n-2) + s[4]*31^(n-4) + ...

Good idea, I didn't consider this and I believe the JIT is not that 
smart. I did this change to the code, but with bad results again. For 
the simple String(char[]), only regressions (server: 1704->1749ns; 
client: 1609->1630ns). The String(int[],int,int) constructor can't use 
the 2x-unrolled code because it must proceed one char at a time due to 
supplementary code points (I could work around this - but then, that 
would add even more code).

As usual, it's dangerous to pollute very tight loops with extra 
variables and calculations; we risk losing performance due to factors 
like register allocation. The eager hashcode code was bad enough, and 
the more sophisticated version is even worse. Things are different when 
the code is really bound by the FSB; the benchmark allocated 500-char 
strings (roughly 1Kb) but this is too small, the new operator zero-fills 
the char[] so everything in in the L1 when it is initialized. Eager 
hashcode computation would probably look better for huge strings that 
don't fit in the caches, then the extra instructions could be completely 
hidden behind cache misses, but this is an irrelevant scenario for String.

A+
Osvaldo

>
>
> int hOdd = 0;
> int hEven = 0;
> final int mSquared = 31*31;
>
> int ne = s.length() &~1;
> for (int i=0; i<ne; i+=2) {
>   hEven = mSquared*hEven+s.charAt(i);
>   hOdd = mSquared*hOdd+s.charAt(i+1);
> }
> if (ne < s.length() {
>   hEven = mSquared*hEven+s.charAt(ne);
>   return hEven+31*hOdd;
> }
> else {
>   return 31*hEven+hOdd;
> }
>
> Other variations possible. For really long Strings, divide the string 
> into M pieces where M is the number of processors available.
>
> Mark Thornton
>
>
>




More information about the coin-dev mailing list