Benefit from computing String Hash at compile time?
Osvaldo Pinali Doederlein
opinali at gmail.com
Mon Dec 21 05:49:36 PST 2009
Em 20/12/2009 19:40, Mark Thornton escreveu:
> Osvaldo Pinali Doederlein wrote:
>> Good idea, I didn't consider this and I believe the JIT is not that
>> smart. I did this change to the code, but with bad results again. For
>> the simple String(char[]), only regressions (server: 1704->1749ns;
>> client: 1609->1630ns). The String(int[],int,int) constructor can't
>> use the 2x-unrolled code because it must proceed one char at a time
>> due to supplementary code points (I could work around this - but
>> then, that would add even more code).
>>
>> As usual, it's dangerous to pollute very tight loops with extra
>> variables and calculations; we risk losing performance due to factors
>> like register allocation. The eager hashcode code was bad enough, and
>> the more sophisticated version is even worse. Things are different
>> when the code is really bound by the FSB; the benchmark allocated
>> 500-char strings (roughly 1Kb) but this is too small, the new
>> operator zero-fills the char[] so everything in in the L1 when it is
>> initialized. Eager hashcode computation would probably look better
>> for huge strings that don't fit in the caches, then the extra
>> instructions could be completely hidden behind cache misses, but this
>> is an irrelevant scenario for String.
>>
> For something important like String.hashCode the JIT doesn't have to
> be smart because we can ask the JVM authors to handcode intrinsic
> alternatives. The fastest code is likely to depend on the hardware ---
> x64 has more registers available, the number of integer multiplier
> units varies, etc.
>
Fair enough, these testes were on a Core2 Duo laptop (Windows 7 32-bit),
so I repeated in another box with Solaris amd64 (with -d64):
new String(char value[]):
Normal = (server) 475ns, (client) 474ns; hashCode() = (server) 891ms,
(client) 890ms
Eager hashing (2x unrolled) = (server) 836ns, (client) 837ns
new String(int[] codePoints, int offset, int count):
Normal = (server) 2030ns, (client) 2028ns
Eager hashing (normal) = (server) 2623ns, (client) 2626ns
The results for the simple constructor were surprisingly better - in
fact they were even "too good" to raise some suspicion, faster than the
isolated hashCode() cost (but not too much, and weird things often
happen in optimization). The cost over the standard, non-eager-hashing
constructors is still too high for popular constructors + strings that
are never hashed. Still this confirms the superiority of 64-bit; the
native code for these simple constructors should leave enough spare
registers that HotSpot could accommodate the extra hash calculation with
much less impact. But I'm actually surprised because I though modern x86
CPUs, even (and remarkably) in 32-bit mode, would use tricks like
virtual register windows so the tiny number of architectural registers
wouldn't matter that much - it seems this doesn't work as well as
advertised.
Unfortunately, for the complex constructors there was no gain, even
slightly worse (+29% for both server and client) although for precise
comparison I'd need to repeat the 32-bit tests in this different system.
A+
Osvaldo
More information about the coin-dev
mailing list