Benefit from computing String Hash at compile time?
Osvaldo Pinali Doederlein
opinali at gmail.com
Sun Dec 20 13:32:38 PST 2009
Em 20/12/2009 17:25, Mark Thornton escreveu:
> Osvaldo Pinali Doederlein wrote:
>> attached in the end) - to no avail. I believe one significant problem
>> is that the hashing function is not friendly to optimization;
>> something like str[0] ^ str[1] ^ str[2]... would be much better
>> (i.e., any function that allows loop unrolling and SIMD tricks). But,
>> as we've already discussed, String.hashCode()'s algorithm cannot be
>> changed so it's pointless considering this.
>
> from JavaDoc: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
>
> = s[0]*31^(n-1) + s[2]*31^(n-3) + ... +
> s[1]*31^(n-2) + s[4]*31^(n-4) + ...
Good idea, I didn't consider this and I believe the JIT is not that
smart. I did this change to the code, but with bad results again. For
the simple String(char[]), only regressions (server: 1704->1749ns;
client: 1609->1630ns). The String(int[],int,int) constructor can't use
the 2x-unrolled code because it must proceed one char at a time due to
supplementary code points (I could work around this - but then, that
would add even more code).
As usual, it's dangerous to pollute very tight loops with extra
variables and calculations; we risk losing performance due to factors
like register allocation. The eager hashcode code was bad enough, and
the more sophisticated version is even worse. Things are different when
the code is really bound by the FSB; the benchmark allocated 500-char
strings (roughly 1Kb) but this is too small, the new operator zero-fills
the char[] so everything in in the L1 when it is initialized. Eager
hashcode computation would probably look better for huge strings that
don't fit in the caches, then the extra instructions could be completely
hidden behind cache misses, but this is an irrelevant scenario for String.
A+
Osvaldo
>
>
> int hOdd = 0;
> int hEven = 0;
> final int mSquared = 31*31;
>
> int ne = s.length() &~1;
> for (int i=0; i<ne; i+=2) {
> hEven = mSquared*hEven+s.charAt(i);
> hOdd = mSquared*hOdd+s.charAt(i+1);
> }
> if (ne < s.length() {
> hEven = mSquared*hEven+s.charAt(ne);
> return hEven+31*hOdd;
> }
> else {
> return 31*hEven+hOdd;
> }
>
> Other variations possible. For really long Strings, divide the string
> into M pieces where M is the number of processors available.
>
> Mark Thornton
>
>
>
More information about the coin-dev
mailing list