Benefit from computing String Hash at compile time?
Osvaldo Pinali Doederlein
opinali at gmail.com
Sun Dec 20 09:50:10 PST 2009
Hi,
> The long answer, I'm coding a prototype impl of this optimization in
> some constructors so I can benchmark this and see if it's worth the
> trouble. As usual it's better telling the code to do all the talking.
Result of this benchmark, for JDK 7b78, creating a 500-char string:
new String(char value[]):
Normal = (server) 559ns, (client) 520ns; hashCode() = (server) 1046ms,
(client) 1182ms
Eager hashing = (server) 1704ns, (client) 1609ns
This first test was certainly a disaster, but that was expected - one of
the constructors that rely on native/intrinsic methods like copyOf(),
copyOfRange(), arraycopy(). Even adding the construction to hash times
don't show any advantage for eager hashing.
new String(int[] codePoints, int offset, int count):
Normal = (server) 3277ns, (client) 3342ns
Eager hashing = (server) 3397ns, (client) 3610ns
This one is much better; we can see some performance degradation with
the eager hashing (+3,66% for server and +8,01% for client), but that's
a small overhead on already-slow constructors, and summing construction
to hashing times show a nice advantage (-27% for server, -25% for
client). I expect the relative overhead of eager hashing to be even
smaller for the 3 constructors that use StringCoding.decode(), but I
didn't try these (much more code to hack).
I've used every optimization possible at source level - manual inlining,
constant folding/propagation, caching fields in locals (sample code
attached in the end) - to no avail. I believe one significant problem is
that the hashing function is not friendly to optimization; something
like str[0] ^ str[1] ^ str[2]... would be much better (i.e., any
function that allows loop unrolling and SIMD tricks). But, as we've
already discussed, String.hashCode()'s algorithm cannot be changed so
it's pointless considering this.
My conclusion is "Myth Busted". Eager hashing helps only the most
complex constructors, which are very rarely used. Even for these
constructors, there is a measurable (if small) cost for strings that are
never hashed, which probably doesn't offset the bigger, but still modest
gain for those that are eventually hashed.
A+
Osvaldo
public String(String original) {
int size = original.count;
char[] originalValue = original.value;
char[] v;
if (originalValue.length > size) {
// The array representing the String is bigger than the new
// String itself. Perhaps this constructor is being called
// in order to trim the baggage, so make a copy of the array.
int off = original.offset;
int newLength = off+size - off;
v = new char[newLength];
int h = 0;
int end = Math.min(originalValue.length - off, newLength);
for (int i = 0; i < end; ++i) {
h = 31*h + (v[i] = originalValue[off + i]);
}
this.hash = h;
} else {
// The array representing the String is the same
// size as the String, so no point in making a copy.
v = originalValue;
}
this.offset = 0;
this.count = size;
this.value = v;
}
A+
Osvaldo
More information about the coin-dev
mailing list