Tune String's hashCode() + equals()

Ulf Zibis Ulf.Zibis at gmx.de
Thu Mar 4 21:36:05 UTC 2010


Much thanks for your effort.

Am 04.03.2010 21:31, schrieb Marek Kozieł:
> @Ulf
> Few explanations:
> 1.
>    
>> Intersting alternative, but I'm afraid, this is against the spec.
>> Shifting all 0's to 1 would break String's hash definition:  h = 31 * h + val[i++].
>>      
> Yes it does, any way i think spec is to tight here. Do we really need
> hash of each value even if String have length like 600000?
> there is noting good coming from it in my opinion.
> Did any one saw at least one code relaying on that ?
> Btw. Same come with .compare
>    

See discussion on project Coin list, subject "Benefit from computing 
String Hash at compile time?"

> 2.
> private static final long isHashBijection= 0x8000000000000000L;
> should be fine
>    

Now I understand how it should work.
Your algorithm will guarantee only 2^63 bijectional values. So strings 
of lenght >= 4 can't have bijectional hashes as 4 chars count 2^64 
variations.

> 3.
> Second sample would work only if hash would be set in constructor so
> even 0 would be valid hash.
>
> 4.
> Maybe I'm wrong but if most cases when hash should be count is String
> concatenation then we could make it +/-
>    
>> a.hash+b.hash*powderOf31[a.length()]
>>      
> so it would not consume so much time.
>    

Interesting idea. I suggest to file an RFE.

> 5.
> Intern string do not need hash codes co comparing cos they have same
> address, so first loop would return true if they are equal, after this
> we need only to check if they are not equal:
>    
>> if (isIntern()&&  anotherString.isIntern()) return false;
>>      

You are right, but
     if (h1 != 0 && h2 != 0 && h1 != h2) return false;
would perform same (if already computed internal hash would be 
back-propagated to the Java object).

> 6.
> Have in mind that adding additional fields to string might be not an
> option, because memory lost this way may have great impact on average
> application efficiency.
>    

+ would break compatibility if objects are serialized.

-Ulf





More information about the core-libs-dev mailing list