Benefit from computing String Hash at compile time?

Osvaldo Pinali Doederlein opinali at gmail.com
Sat Dec 19 04:53:23 PST 2009


Em 18/12/2009 21:01, Artur Biesiadowski escreveu:
> Osvaldo Doederlein wrote:
>    
>> - Some constructors (remarkably for substring and cloning) rely on
>> Arrays.copyOfRange(), which implementation is more efficient than any Java
>> loop (I guess it's a HotSpot intrinsic with optimization for alignment
>> etc.). In that case, using an explicit loop so we can smuggle the hashcode
>> calculation inside it, will probably have a measurable disadvantage. But
>> this disadvantage is only for construction (and then only for large
>> strings); for strings that are ever hashed, the net saving will always be
>> still positive.
>>      
> Especially in case of substring, optimized private constructor is used,
> which just does 3 assignments. With your idea, it would have to iterate
> over all elements. This is quite common operation.
>    

The short answer: you are right, that's an important special case, 
remarkably in methods using several temporary strings (often substrings 
of some previous string) because temp strings are virtually never 
hashed; and the sharing of String.value is critical to String's 
immutable design. But this only means that eager computation of the 
hashcode is not always a good idea - so, perhaps we can do that eagerly 
in all/most constructors that create a new String.value; or more 
generally, in any constructor where this extra computation is proved to 
not produce any significant performance degradation. For other 
constructors, we just keep String.hash initialized with 0, so the 
current hashCode() is kept unchanged and will calculate the value if 
necessary.

The long answer, I'm coding a prototype impl of this optimization in 
some constructors so I can benchmark this and see if it's worth the 
trouble. As usual it's better telling the code to do all the talking.

> I wonder if there is anything (some Hotspot intrinsic?) preventing quick
> hack on java.lang.String, putting it in bootclasspath/a and measuring
> time of javac few thousands source files, reindexing huge lucene data
> and maybe hsql on some test database. It should at least give a rough
> figure if it changes the speed in any measurable way.
>    

HotSpot is doing some intrinsic tricks for String/StringBuilder (IIRC) 
in recent JDK7 build, but I didn't check these changesets... but I don't 
think it would affect such benchmarking, if we don't change the data 
layout (fields).

A+
Osvaldo




More information about the coin-dev mailing list