RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes

Fri Jun 26 06:29:16 UTC 2015

I run with wls, and result is:

1) original:
SymbolTable statistics:
Number of buckets       :     20011 =    160088 bytes, avg   8.000
Number of entries       :    265300 =   6367200 bytes, avg  24.000
Number of literals      :    265300 =  12325168 bytes, avg  46.457
Total footprint         :           =  18852456 bytes
Average bucket size     :    13.258
Variance of bucket size :    13.209
Std. dev. of bucket size:     3.634
Maximum bucket size     :        33

2) reduced to short and new identity_hash():
SymbolTable statistics:
Number of buckets       :     20011 =    160088 bytes, avg   8.000
Number of entries       :    265237 =   6365688 bytes, avg  24.000
Number of literals      :    265237 =  11805440 bytes, avg  44.509
Total footprint         :           =  18331216 bytes
Average bucket size     :    13.255
Variance of bucket size :    13.209
Std. dev. of bucket size:     3.634
Maximum bucket size     :        33

They have similar balance data.  The no. of symbols decreased 63.

Thanks
Yumin

On 6/24/2015 1:10 PM, Yumin Qi wrote:
> Updated webrev at 03 again. Please let me know if it is OK.
>
> Thanks
> Yumin
>
> On 6/24/2015 9:21 AM, Yumin Qi wrote:
>> Aleksey and Ioi,
>>
>> On 6/24/15, 2:55 AM, Aleksey Shipilev wrote:
>>> Hi Ioi,
>>>
>>> On 06/23/2015 10:28 PM, Ioi Lam wrote:
>>>> I assume by "address" you mean
>>>>
>>>> ((unsigned)((this)>>  (LogMinObjAlignmentInBytes + 3)));
>>> Yes.
>>>
>>>> I think for performance reasons we are storing a random number in the
>>>> Symbol instead of computing the hashcode base on the string value.
>>>> The intention of this RFE not to change that, but just use fewer
>>>> random bits to accomplish the same goal. Yes, we are assuming a
>>>> particular hashtable implementation, but I don't think we are going
>>>> to change that any time soon.
>>> ...
>>>
>>>> And if we ever change the hash function
>>>> to heavily depend on the top 16 bits of the hash code, we can
>>>> certainly revert back to a 32-bit random number.
>>> This sounds awfully similar to introducing the technical debt.
>>> Succumbing to this practice means you will have to make an avalanche
>>> modification once you need to do a little change in hash table
>>> implementation, given somebody actually remembers to change the hash
>>> function. I'm not keen of introducing these dependencies without a
>>> *very* good reason to.
>>>
>>> In this example, if there is a hash function that returns a 32-bit
>>> value, please make sure the value is random in all 32 bits. If that
>>> proves hard, then the memory footprint savings (what, 1K out of 3626K,
>>> if I read the data right?) does not worth the hassle of complicating 
>>> the
>>> hash function beyond reasonable, and introducing undocumented
>>> dependencies that will eventually backfire.
>> Currently the _identity_hash = os::random() which the random seed is 
>> 1234567.
>> I think to make it better to have more random, we need to have a seed 
>> as 65535? Since 0 is same as 65536 with a short and so on.
>>
>> I'm not familiar with the random algorithm (os:random())--- suppose 
>> it will generate a random between 0 and _rand_seed?
>>
>>
>> Thanks
>> Yumin
>>
>>> Thanks,
>>> -Aleksey
>