Benefit from computing String Hash at compile time?

Mon Dec 21 05:49:36 PST 2009

Em 20/12/2009 19:40, Mark Thornton escreveu:
> Osvaldo Pinali Doederlein wrote:
>> Good idea, I didn't consider this and I believe the JIT is not that 
>> smart. I did this change to the code, but with bad results again. For 
>> the simple String(char[]), only regressions (server: 1704->1749ns; 
>> client: 1609->1630ns). The String(int[],int,int) constructor can't 
>> use the 2x-unrolled code because it must proceed one char at a time 
>> due to supplementary code points (I could work around this - but 
>> then, that would add even more code).
>>
>> As usual, it's dangerous to pollute very tight loops with extra 
>> variables and calculations; we risk losing performance due to factors 
>> like register allocation. The eager hashcode code was bad enough, and 
>> the more sophisticated version is even worse. Things are different 
>> when the code is really bound by the FSB; the benchmark allocated 
>> 500-char strings (roughly 1Kb) but this is too small, the new 
>> operator zero-fills the char[] so everything in in the L1 when it is 
>> initialized. Eager hashcode computation would probably look better 
>> for huge strings that don't fit in the caches, then the extra 
>> instructions could be completely hidden behind cache misses, but this 
>> is an irrelevant scenario for String.
>>
> For something important like String.hashCode the JIT doesn't have to 
> be smart because we can ask the JVM authors to handcode intrinsic 
> alternatives. The fastest code is likely to depend on the hardware --- 
> x64 has more registers available, the number of integer multiplier 
> units varies, etc.
>

Fair enough, these testes were on a Core2 Duo laptop (Windows 7 32-bit), 
so I repeated in another box with Solaris amd64 (with -d64):

new String(char value[]):
Normal = (server) 475ns, (client) 474ns; hashCode() = (server) 891ms, 
(client) 890ms
Eager hashing (2x unrolled) = (server) 836ns, (client) 837ns

new String(int[] codePoints, int offset, int count):
Normal = (server) 2030ns, (client) 2028ns
Eager hashing (normal) = (server) 2623ns, (client) 2626ns

The results for the simple constructor were surprisingly better - in 
fact they were even "too good" to raise some suspicion, faster than the 
isolated hashCode() cost (but not too much, and weird things often 
happen in optimization). The cost over the standard, non-eager-hashing 
constructors is still too high for popular constructors + strings that 
are never hashed. Still this confirms the superiority of 64-bit; the 
native code for these simple constructors should leave enough spare 
registers that HotSpot could accommodate the extra hash calculation with 
much less impact. But I'm actually surprised because I though modern x86 
CPUs, even (and remarkably) in 32-bit mode, would use tricks like 
virtual register windows so the tiny number of architectural registers 
wouldn't matter that much - it seems this doesn't work as well as 
advertised.

Unfortunately, for the complex constructors there was no gain, even 
slightly worse (+29% for both server and client) although for precise 
comparison I'd need to repeat the 32-bit tests in this different system.

A+
Osvaldo