Benefit from computing String Hash at compile time?

Sun Dec 20 09:50:10 PST 2009

Hi,

> The long answer, I'm coding a prototype impl of this optimization in 
> some constructors so I can benchmark this and see if it's worth the 
> trouble. As usual it's better telling the code to do all the talking.

Result of this benchmark, for JDK 7b78, creating a 500-char string:

new String(char value[]):
Normal = (server) 559ns, (client) 520ns; hashCode() = (server) 1046ms, 
(client) 1182ms
Eager hashing = (server) 1704ns, (client) 1609ns

This first test was certainly a disaster, but that was expected - one of 
the constructors that rely on native/intrinsic methods like copyOf(), 
copyOfRange(), arraycopy(). Even adding the construction to hash times 
don't show any advantage for eager hashing.

new String(int[] codePoints, int offset, int count):
Normal = (server) 3277ns, (client) 3342ns
Eager hashing = (server) 3397ns, (client) 3610ns

This one is much better; we can see some performance degradation with 
the eager hashing (+3,66% for server and +8,01% for client), but that's 
a small overhead on already-slow constructors, and summing construction 
to hashing times show a nice advantage (-27% for server, -25% for 
client). I expect the relative overhead of eager hashing to be even 
smaller for the 3 constructors that use StringCoding.decode(), but I 
didn't try these (much more code to hack).

I've used every optimization possible at source level - manual inlining, 
constant folding/propagation, caching fields in locals (sample code 
attached in the end) - to no avail. I believe one significant problem is 
that the hashing function is not friendly to optimization; something 
like str[0] ^ str[1] ^ str[2]... would be much better (i.e., any 
function that allows loop unrolling and SIMD tricks). But, as we've 
already discussed, String.hashCode()'s algorithm cannot be changed so 
it's pointless considering this.

My conclusion is "Myth Busted". Eager hashing helps only the most 
complex constructors, which are very rarely used. Even for these 
constructors, there is a measurable (if small) cost for strings that are 
never hashed, which probably doesn't offset the bigger, but still modest 
gain for those that are eventually hashed.

A+
Osvaldo

     public String(String original) {
         int size = original.count;
         char[] originalValue = original.value;
         char[] v;
         if (originalValue.length > size) {
             // The array representing the String is bigger than the new
             // String itself.  Perhaps this constructor is being called
             // in order to trim the baggage, so make a copy of the array.
             int off = original.offset;
             int newLength = off+size - off;
             v = new char[newLength];
             int h = 0;
             int end = Math.min(originalValue.length - off, newLength);
             for (int i = 0; i < end; ++i) {
               h = 31*h + (v[i] = originalValue[off + i]);
             }
             this.hash = h;
         } else {
             // The array representing the String is the same
             // size as the String, so no point in making a copy.
             v = originalValue;
         }
         this.offset = 0;
         this.count = size;
         this.value = v;
     }

A+
Osvaldo